Experiences with upgrading from CentOS 7 to Debian

Dennis van Dok

HEPiX spring 2024 Workshop, Laboratoire Astroparticule et Cosmologie (APC) de l'Université Paris-Cité, Monday 2024-04-15

The plan

As we all know, Red Hat dismantled CentOS as a binary compatible rebuild of their Enterprise Linux OS.

While others have stepped in to fill that void (Rocky, ALma, do we even mention Oracle?) Red Hat seems to actively discourage these efforts by limiting access to the (packaging) sources and restrictive licences.

This has made some of us a bit queasy about sticking with the Red Hat compatible platform that served us for so long.

Many of the services that are not user-facing are probably served just as well on any other platform. And what other platform would offer better long-term stability and guaranteed openness but Debian?

The plan was to port all CentOS7 work to either:

Alma Linux 9
for the user facing systems, such as user interfaces, worker nodes, login nodes, and systems that otherwise have to be RHEL compatible.
Debian 12
for all other systems

The problems begin

One colleague came up with the following metaphor:

Trying to renovate an old building from the ground up while all the residents remain living and working in it.

(Which is not very different from our recent experience with renovating our building and keeping the lights on in the data centre.)

On the CT-b side

(Nikhef IT is basically split between general services, called CT-beheer, and the High-Throughput computing side or NDPF. They face many challenges, some similar, some different.)

It is not in the end so much an issue of having so many systems, but more a matter of having so many different services that have to be ported.

Not everything is well

We would have loved to have a unified, automated system setup and configuration system, but various admins used to do things differently from one another in the past and such a system never materialised. This has build a large technical debt over time that now needs to be paid off.

During a crucial phase in the project, effective management and supervision were lacking. It has various reasons, the renovation mentioned earlier certainly contributed. It led to a lack of overarching planning and guidance.

  • Many services were built up over the years with many layers of complexity, difficult to reverse-engineer.
  • Porting legacy software with outdated Python and PHP code was hard because often the original developers were unavailable.
  • The overall amount of work was underestimated by management. Only recently external capacity was hired to help out.

To quote the admin:

I would have loved to come and tell you all of this in person, but I can literally not afford a single day of not working on pushing the project forward.

Meanwhile, on the NDPF side

(That is the Nikhef Data Processing Facility) things are looking slightly differently:

  • Standardised configuration management for many years
  • Fewer legacy systems
  • many nodes but fewer different types of services overall.

Moving away from Torque (to HTCondor)

What complicates things a little more is the plan to (finally) say goodbye to our Torque batch systems and move everything to HTCondor. The timing of this effort is under considerable pressure.

Arguments against moving to Debian

There is a fair question one can raise why we decided to do two things (or three things if you count HTCondor) instead of one: just upgrade to Alma Linux 9 instead of also switching platforms.

To counter this argument, many of the issues we face would have been a problem on Alma 9 as well, simply because the legacy software needs to be ported to newer software such as Python 3.

Investment in Debian

Having a reproducible way to set up a Debian system with full automation is only just being wrapped up as a task on the NDPF side. This has delayed a good chunk of the work.

In order to port more software to Debian we are setting up a build system to produce native packages.

This is currently an old worker node with Debian 12 with a gitlab runner to run sbuild.

For building RPM packages, podman+mock will be tried (it should work, still working out some kinks).

In order to help out getting software to the community, striving to become a full Debian Developer.

Would anybody care to sign my GPG key?

pub   rsa4096/0xDFFAD8197617EF19 2012-10-02 [SC]
      Key fingerprint = 5869 B8BB 7794 13AE 2BBC  11E3 DFFA D819 7617 EF19
uid                   [ultimate] Dennis van Dok <dennisvd@nikhef.nl>
uid                   [ultimate] Dennis van Dok <dvdok@xs4all.nl>
uid                   [ultimate] Dennis van Dok <dvandok@gmail.com>
sub   rsa4096/0x18B1C8A6C3660E25 2012-10-02 [E]
      Key fingerprint = 69C5 408A 4932 982B 78AB  0DB8 18B1 C8A6 C366 0E25