HEPiX spring 2024 Workshop, Laboratoire Astroparticule et Cosmologie (APC) de l'Université Paris-Cité, Monday 2024-04-15
As we all know, Red Hat dismantled CentOS as a binary compatible rebuild of their Enterprise Linux OS.
While others have stepped in to fill that void (Rocky, ALma, do we even mention Oracle?) Red Hat seems to actively discourage these efforts by limiting access to the (packaging) sources and restrictive licences.
This has made some of us a bit queasy about sticking with the Red Hat compatible platform that served us for so long.
Many of the services that are not user-facing are probably served just as well on any other platform. And what other platform would offer better long-term stability and guaranteed openness but Debian?
The plan was to port all CentOS7 work to either:
One colleague came up with the following metaphor:
Trying to renovate an old building from the ground up while all the residents remain living and working in it.
(Which is not very different from our recent experience with renovating our building and keeping the lights on in the data centre.)
(Nikhef IT is basically split between general services, called CT-beheer, and the High-Throughput computing side or NDPF. They face many challenges, some similar, some different.)
It is not in the end so much an issue of having so many systems, but more a matter of having so many different services that have to be ported.
We would have loved to have a unified, automated system setup and configuration system, but various admins used to do things differently from one another in the past and such a system never materialised. This has build a large technical debt over time that now needs to be paid off.
During a crucial phase in the project, effective management and supervision were lacking. It has various reasons, the renovation mentioned earlier certainly contributed. It led to a lack of overarching planning and guidance.
To quote the admin:
I would have loved to come and tell you all of this in person, but I can literally not afford a single day of not working on pushing the project forward.
(That is the Nikhef Data Processing Facility) things are looking slightly differently:
What complicates things a little more is the plan to (finally) say goodbye to our Torque batch systems and move everything to HTCondor. The timing of this effort is under considerable pressure.
There is a fair question one can raise why we decided to do two things (or three things if you count HTCondor) instead of one: just upgrade to Alma Linux 9 instead of also switching platforms.
To counter this argument, many of the issues we face would have been a problem on Alma 9 as well, simply because the legacy software needs to be ported to newer software such as Python 3.
Having a reproducible way to set up a Debian system with full automation is only just being wrapped up as a task on the NDPF side. This has delayed a good chunk of the work.
In order to port more software to Debian we are setting up a build system to produce native packages.
This is currently an old worker node with Debian 12 with a gitlab runner to run sbuild.
For building RPM packages, podman+mock will be tried (it should work, still working out some kinks).
In order to help out getting software to the community, striving to become a full Debian Developer.
Would anybody care to sign my GPG key?
pub rsa4096/0xDFFAD8197617EF19 2012-10-02 [SC] Key fingerprint = 5869 B8BB 7794 13AE 2BBC 11E3 DFFA D819 7617 EF19 uid [ultimate] Dennis van Dok <dennisvd@nikhef.nl> uid [ultimate] Dennis van Dok <dvdok@xs4all.nl> uid [ultimate] Dennis van Dok <dvandok@gmail.com> sub rsa4096/0x18B1C8A6C3660E25 2012-10-02 [E] Key fingerprint = 69C5 408A 4932 982B 78AB 0DB8 18B1 C8A6 C366 0E25