We've been using Quattor since the early DataGrid days.
Changing landscape; grid services see less innovation, new CM systems emerged along with growing cloud deployments.
If there ever was a moment to do it, this was it!
Credits to Andrew Pickford!
Looked at quattor upgrade:
(But some were rejected outright based on personal prejudice.)
Two candidates came very close: Saltstack and Ansible with no obvious winner.
Saltstack came out ahead by a nose on technicalities.
(Ansible would have served us just fine.)
(Based on previous experiences)
Test mode shows what would change.
Discussed (a bit) at HEPiX before.
2016, Sandy Philpott, Site report,
2017, Owen Synge, Technical talk,
Widely used in various open source communities.
(But anyway…)
data source | kind of data | typical examples |
---|---|---|
pillar | static per-node | server name, ip address |
formula | states related to a single aspect | mysql, iptables |
state | elementary settings | installed packages, running services |
We separated the
The pillar is provided by Reclass.
A recursive classifier, collecting static hierarchical information about nodes providing pillar data.
Originally http://reclass.pantsfullofunix.net/, but the most active fork at the moment is https://github.com/salt-formulas/reclass/. Our version currently is https://github.com/AndrewPickford/reclass/.
(Remember, not a technical talk!)
Example, slightly simplified. This is a dCache master node in our testbed.
classes:
- cluster.ndpf.testbed.dcache
- hardware.vm.xen.standard
- os.linux.redhat.centos.7
- role.server.dcache.plain.master
environment: pre-prod
parameters:
_hardware_: (here be the VM provisioning parameters)
here is cluster/ndpf/testbed/dcache/init.yml:
classes:
- cluster.ndpf.testbed
parameters:
_cluster_:
name: dcache testbed
dcache_version: 3.1
dcache_carbon_server: ${_cluster_:monitoring_satellite}
dcache_nfs_allowed_ipv4:
- ${_site_:networks:ipv4:stbcnet}
- ${_site_:networks:ipv4:wnnet}
cluster/ndpf/testbed/init.yml:
classes:
- cluster.ndpf
parameters:
_cluster_:
name: testbed
monitoring_satellite: vaars-03.nikhef.nl
Note that _cluster_:name
is given here, but the class cluster.ndpf.testbed.dcache
overrides it.
Reclass is not without its shortcomings. It needed work to make it do what we wanted, and was (therefore) almost rejected.
We still went ahead and fixed it.
Written in python which is nice and forgiving to programmers.
Our patches are available on Github, and we're looking to integrate with versions maintained by the salt-formulas people.
- Failed to load ext_pillar reclass: ext_pillar.reclass: → …-> cc2.cloud.ipmi.nikhef.nl Cannot resolve ${_cluster_:some:value}, at → …_cluster_:monitoring_satellite, → …in yaml_fs:///srv/salt/env/dennisvd/classes/cluster/ndpf/cloud/init.yml
All the moving parts are grouped by formulas.
apache, authconfig, autofs, backupninja, bind, certificates, cinder, cobbler, contrailctl, cups, cvmfs, dcache, dell_mdsm, docker, elasticsearch, eos, galera, git, glance, grafana, graphite, grid, haproxy, hardware, horizon, icinga, iptables, keepalived, kerberos, keystone, kibana, linux, logrotate, logstash, maui, memcached, munge, mysql, neutron, nfs, nikhef, nova, ntp, pacemaker, pakiti, php, postfix, postgresql, prometheus, python, rabbitmq, reclass, repo-mirrors, rsync, rsyslog, salt, sanity-check, secure, tftpd_hpa, torque, zookeeper
Pros:
Cons:
Choice:
High level pepper scripts to replace low level salt.
Pepper-deploy
will stagger updates to prevent overload on the master.
Environments correspond to branches in git.
The monitoring system defines how the actual monitoring is done for all of those things. It gets the list of nodes and services from the inventory.
The cobbler node has to manage both production and pre-production, and is the 'odd one out' as it has no pre-production counterpart.
The cobbler server also collects mirrors of various repositories for software installation.