Condors in Containers

Dennis van Dok

Nikhef Jamboree 2024-05-13

A farewell to Torque

For over 20 years, we have been running stoomboot with a batch system called Torque.

It's been a trusty companion through these many years.

We've come to know its quirks; its limitations; and its moods.

The end of the road has been reached. It is time to say goodbye. You could say the system is on life support, but in truth, there is no support.

But without Torque, what is next?

Flocking to HTCondor

  • The HTCondor system is a product that is developed and supported by a team from the University of Wisconsin-Madison
  • It's free, open source software
  • Nikhef has a history with it going back 30 years!

Transition in progress

  • We are installing more capacity to the new cluster at this very moment
  • Expect an invitation to join the new cluster in your Inbox soon
  • If you want to try it out early, drop a note at stbc-admin@nikhef.nl.

What happens to the old cluster?

  • We will keep the Torque cluster going for a while to smooth the transition.
  • Expect everybody switches before July

What happens to CentOS 7

  • Support for CentOS 7 runs out 30 June 2024
  • We won't be able to offer any CentOS 7 capacity after that date
  • You could use a CentOS 7 container, but we cannot guarantee that container images for CentOS7 will remain available.

Submit scripts

Old:

$ cat job.sh
#!/bin/bash
#
#PBS -q long

./run.sh

$ 

New:

$ cat runjob.sub
executable   = run.sh
log          = run.log
output       = outfile.txt
error        = errors.txt
+UseOS       = "el9"
+JobCategory = "long"
queue
$ 

Running jobs

Old:

$ qsub job.sh
17953664.burrell.nikhef.nl
$ 

New:

$ condor_submit runjob.sub
Submitting job(s).
1 job(s) submitted to cluster 556.
$ 

Other tools

Old:

  • qsub
  • qstat
  • qdel

New:

  • condor_submit
  • condor_q
  • condor_rm

Condor in Containers

  • All jobs will be run in a container
  • This is done transparently
  • shares (/home, /project, etc.) are available as normal
  • You may select either the base OS or bring your own container image

Selecting a basic OS

  • Put

    +UseOS: "el9"
    

    in your submission script.

  • This selects a base container image
  • currently allowed values are:

    {"el7", "el8", "el9"}
    

    Which select a default image compatible with Red Hat Enterprise Linux versions 7, 8, and 9 respectively.

Selecting your own container image

Alternatively, you may choose any other container image. Or create your own.

+SingularityImage = "/project/myproject/ourimages/myfavouriteimage.sif"
or
+SingularityImage = "/cvmfs/unpacked.cern.ch/registry.hub.docker.com/..."
or
+SingularityImage = "/cvmfs/singularity.opensciencegrid.org/..."

Selecting a category

  • We like to tune priorities of jobs based on their characteristics
  • short jobs should be scheduled ahead of long jobs
  • jobs from users who run many jobs are lowered in priority

Setting the category

  • Specify

    +JobCategory = "short"
    

    in you job script

  • Allowed values are "short", "medium", "long" with different default and maximum run times.
  • Specify desired Maximum wall clock time with

    +MaxWallTime = 96 * 3600
    

    for the maximum 96 hours.

Containers, containers everywhere…

What are these containers anyway?

Similar to an operating system…

A container image presents the file system as it would on a particular operating system:

  • files
  • libraries
  • configuration

…But different

  • The kernel is that of the host OS
  • containers provide isolation from one another
  • Same host can run multiple containers simultaneously

Building your own container images

Several recipies and tools are available.

See https://kb.nikhef.nl/ct/Containers.html

Where to get help

Questions?