Nikhef Containers at Scale
Dennis van Dok
Scientists have a lot of data to process
- carve up data in bite-size chunks \(O(GB)\)
- process individual chunks in batch jobs
- scale up number of batch jobs to reduce overall
processing time—sometimes from years to weeks
Users rely on container images for their software distribution
- Makes it uniform across all resource centers
- Reduces dependency on local site administration
But it also introduces security risks and scalability issues from the
site's point of view.
Security Risks with Containers
Singularity (https://sylabs.io/docs/) is a popular choice for container
- Current coding practices of Singularity are not up to highest security standards.
- From the user's point of view it is simple (just point to the container image
- Security complications because the privileged mode relies on the loop
device mechanism in the Linux kernel; this requires root privileges by making
the singularity binaries setuid root.
- Alternatively, Singularity non-privileged mode relies on linux
control groups to restrict users to a subdirectory
- In the context of running \(O(1000)\) jobs this could incur high costs
- Especially in non-privileged mode, the unpacking of container images
could take up to 10s of minutes.
- Modern-day worker nodes have 32 or more job slots, meaning
potentially unpacking and running 32 containers simultaneously.
- Could be high fraction of the overall processing time per job for
short jobs (there is variance depending on use case).
- Reduce the set up time for a container (10s of minutes eating up
your wall time for a job slot)
- Scale up to running 100 containers on a single machine in under 5 minutes
- Verify the security of the solution
One approach is to make the unpacked images available via CVMFS.
- It would have to be transparent for the users, as simple as their
current use case.
- CVMFS is a proven system to massively scale software distribution
- Details about typical container sizes and machine sizes to be discussed
with the Nikhef team.