# Nikhef Containers at Scale

2019-04-09

## Science Grid

Scientists have a lot of data to process

• carve up data in bite-size chunks $$O(GB)$$
• process individual chunks in batch jobs
• scale up number of batch jobs to reduce overall processing time—sometimes from years to weeks

## Issue

Users rely on container images for their software distribution

• Makes it uniform across all resource centers
• Reduces dependency on local site administration

But it also introduces security risks and scalability issues from the site's point of view.

## Security Risks with Containers

Singularity (https://sylabs.io/docs/) is a popular choice for container use.

• Current coding practices of Singularity are not up to highest security standards.
• From the user's point of view it is simple (just point to the container image or URL)
• Security complications because the privileged mode relies on the loop device mechanism in the Linux kernel; this requires root privileges by making the singularity binaries setuid root.
• Alternatively, Singularity non-privileged mode relies on linux control groups to restrict users to a subdirectory

## Scalability Issues

• In the context of running $$O(1000)$$ jobs this could incur high costs
• Especially in non-privileged mode, the unpacking of container images could take up to 10s of minutes.
• Modern-day worker nodes have 32 or more job slots, meaning potentially unpacking and running 32 containers simultaneously.
• Could be high fraction of the overall processing time per job for short jobs (there is variance depending on use case).

## Objectives

The challenge:

• Reduce the set up time for a container (10s of minutes eating up your wall time for a job slot)
• Scale up to running 100 containers on a single machine in under 5 minutes
• Verify the security of the solution

### Further notes

One approach is to make the unpacked images available via CVMFS.

• It would have to be transparent for the users, as simple as their current use case.
• CVMFS is a proven system to massively scale software distribution
• Details about typical container sizes and machine sizes to be discussed with the Nikhef team.