Softdrive.nl, CVMFS for the masses

Dennis van Dok

Generic Components of the eScience Infrastructure Ecosystem — 14th IEEE eScience Conference Amsterdam, Monday 2018-10-29

For large organisations this overhead is acceptable and the effort is well worth it. But small groups who could still benefit from the ease of software distribution cannot find the time and effort to run a Stratum-0, or push to get their repositories on the site worker nodes. For site administrators, the prospect of having to mount dozens of small repositories is unattractive.

The simplest solution is to have a single repository to serve 'the rest of us', the users and small research groups who wish to deploy their software to the grid in an easy way.

The name chosen for this distribution is softdrive.nl and is set up jointly by SURFSara and Nikhef, who operate together as part of the Dutch National Infrastructure, and as a WLCG Tier-1.

Every e-science user in the Netherlands is entitled to request a directory under this repository. The user can log in to the user interface machine that is the source of the repository, and can edit their software tree as they see fit. To commit the edits, the user touches a signalling file in the tree.

On the Nikhef end, we are running the Stratum-0 of softdrive.nl. A script periodically rsyncs with the user interface to retrieve the tree structure but only the signalling files. Any such file that is fresher than what is in the current repository signals that the subtree where it resides is ready to be synchronised. The second phase of the script rsyncs all such directories to it's own tree after initiating a new transaction.

At the end of the rsync, the transaction is finished by publishing the new repository.

TODO: tell users to use .cvmfscatalog for large catalogs

Grid computing a.k.a. PaaS

Large scale common science infrastructure for high throughput batch computing.

the only guaranteed environment is the base OS and some middleware
no persistent local storage between jobs
bring your own software

For the past two decades we've been building an infrastructure for large scale computing called the Grid and it's been succesfully exploited by many science communities over the years. Even as the grid moniker is falling out of grace, this is still the largest deployment of computing power and storage world wide.

Back then virtualisation of computers was in its infancy, and cloud computing was but a shimmer on the horizon. Who could blame the architects for implementing what is now called, sometimes disparagingly, a platform as a service?

Let's take a few steps back and look at how this evolved. Universities and labs with a considerable demand for computing power started deploying batch systems such as PBS, Torque, Sun Grid Engine or LSF to name a few. They allowed local users to log in and submit work to run in batches, sometimes on multiple computers at the same time, for hours or days depending on local policy.

With the development of the Grid, touting world-wide access to multiple resource centres, these batch systems didn't go away; they were simply front-ended with a layer of software to abstract away the local log-in and policy aspects.

A consequence of this approach was that the runtime environment for a Grid job could vary from one place to the next. Even though much effort was spent to assure a minimally compatible platform, differences were inevitable and required careful programming on the user's part.

One of the hurdles users need to overcome is software installation or rather software distribution. How to make sure that the required software stack is available at run-time?

In true PaaS spirit, the only solution is to bring your own distribution with every job. Sometimes this is referred to as 'paratrooper mode' but it might as well have been called 'caravan mode' for a true idea of the sheer bulk that is dragged along.

Challenges of software distribution

Bringing software with every job incurs much overhead
Projects to develop common software distributions have a slow upgrade cycle
Negotiating a locally writable software area for each site takes time, effort and coordination

Software distribution with CVMFS

CVMFS spun off the the CERN Virtual Machine
content delivery based on http
data is distributed as objects referenced by hashes
read-only, so trivial to replicate massively
transactionally consistent indices
garbage collected

Architecture

The graph could be read from the bottom to see how data propagates. The initial file system set up is done on a stratum-0 machine, which mounts the object data on the /cvmfs path and uses an overlay file system to have a writable layer. When files are updated and the transaction is committed, these changes are recorded into new objects, the index is updated and the revision number incremented.

Periodically (every few hours) the stratum-1 systems contact the stratum-0 fo replicate all the data. The stratum-1s don't have a file system view of the data, just the objects. Usually access to the stratum-0 is restricted to just allow known stratum-1 servers based on IP address.

The sites that have many worker nodes running cvmfs will usually set up one or more squid proxies to cache cvmfs data. This reduces the strain on the stratum-1 servers.

Drawbacks

CVMFS is great for large organisations. But for small teams it can be a real challenge:

set up and maintain a repository
take care of a Stratum-0 server
negotiate the replication at Stratum-1 sites
negotiate with sites to include the repository in their CVMFS configuration

I imagined dozens of small e-science groups knocking on my door to get their repositories mounted.

Our solution: softdrive.nl

Nikhef and SURFSara have jointly set up /cvmfs/softdrive.nl to offer a single CVMFS repository for all e-science users in the Netherlands.

Architecture

The system consists of

a user interface system, where users can log on (with ssh) and upload their software
a Stratum-0 server which copies the user's files at regular intervals
Stratum-1 at Nikhef and RAL
mounted by default on all grid resources in the Netherlands

/cvmfs/softdrive.nl/zwamborn/
/cvmfs/softdrive.nl/ceitan/
/cvmfs/softdrive.nl/phop/
/cvmfs/softdrive.nl/svdaele/
/cvmfs/softdrive.nl/tseker/
/cvmfs/softdrive.nl/ajones/
/cvmfs/softdrive.nl/rbyrne/
/cvmfs/softdrive.nl/fsweijen/
/cvmfs/softdrive.nl/kooyman/

Rules

User requests account at SURFSara
Standard quota of 2GB (could be extended)
Manage software on softdrive.grid.sara.nl
Copy software to /cvmfs/softdrive.nl/$USER
Run the publish command which touches the softdrive.modified file

Mechanism

Automated rsync from Stratum-0 server at Nikhef
Two stage process:
1. rsync the softdrive.modified files
2. rsync those directories with updated softdrive.modified files

Quirks

Catalog size exploded when monitoring was put in place. The monitoring triggered an update every five minutes and thereby a completely new, full catalog of all files.

This was ultimately understood and remedied by making subcatalogs per user.

In order to monitor the correct functioning of the mechanism, one of the directories had automatic updates applied to it to trigger an update every time. By checking the timestamp of the files on a client node it could be determined if the entire chain was still going.

This worked wonderfully, but it had an unintended consequence. Basically all of softdrive.nl has a single catalog, and every 15 minutes an entire new catalog (of all the files) had to be created. This caused a kind of data explosion on the stratum-0 as all earlier catalogs are kept as well.

This problem was fixed by adding a special file to the monitoring subdirectory (and all user directories, in fact) that tells CVMFS to start subcatalogs in these directories. Only a very small catalog is updated every time now.

User experience

To complement the technical implementation, the total user experience was taken care of by having proper documentation, monitoring and guidance.

Documentation

The user documentation is right there when logging on to the system. The message of the day, printed for login shells, gives a summary of the workings of the system and how to publish data.

More extensive documentation was written and placed on-line.

Monitoring

End to end monitoring of the system is done by automatically triggering a change to the system every hour and measuring the time it takes for the data to reach a client machine. Alerts are raised if the delay reaches a certain threshold, prompting the technicians to inspect what went wrong.

Summary

The softdrive model has proven to be succesful; it is easy for users to maintain their own software; the software is lightweight and the maintainance burden on the administrators is very light.

There is no plan at this point to add more bells and whistles to the system.

Even as the PaaS infrastructure dwindles in favour of IaaS (infrastructure as a service), the CVMFS system could still be a viable component for delivering software.

Some numbers

25 active users last 6 months
393k files, 178 GB

Interested?

Some other national grid infrastructures offer something similar to softdrive, but I've not heard of anyone interested in cloning our setup. If you have plans to provide CVMFS to your users, and would perhaps like to use (parts of) the softdrive system, don't hesitate to contact me.

Acknowledgements

Coen Schrijvers and colleagues at SURFSara for user documentation and monitoring.
Catalin Condurache (RAL) for the fail-over Stratum-1.
Ronald Starink for the initial setup of the CVMFS system at Nikhef.
http://doc.grid.surfsara.nl/en/latest/Pages/Advanced/grid_software.html