Nikhef Site Report

Dennis van Dok

2019-03-26

Nikhef mission

The mission of the National Institute for Subatomic Physics Nikhef is to study the interactions and structure of all elementary particles and fields at the smallest distance scale and the highest attainable energy.

Accelerator-based particle physics
Astroparticle physics

Organisation

Personnel

Projects

LHC Experiments

Cosmic phenomena

Other Programs

Theory programs
Enabling programs
- Detector Research and Development
- Physics Data Processing

Computer technology

Office automation
Project support (detector software engineering)
Scientific computing (a.k.a. Physics Data Processing or PDP)

People

New sysadmin in the PDP group: Mary Hester joined in Februari.

The Computer Technology group hired a new webmaster: Roel Roomeijer. joined the system administration team in March.

Physics Data Processing Group

WLCG Tier-1

Nikhef-PDP is half of the Netherlands Tier-1.

The other half is SURFsara.

Together we also participate in the DNI, the Dutch National Infrastructure for data/compute intensive sciences.

LOFAR (radio astronomy)
Project MinE (genetic causes of motor neuron disease ALS)
LIGO/VIRGO

Computing last year

SURFsara

(thanks to Onno Zweers)

first non-HEP pilots for high throughput openstack platform;
LOFAR download servers to be replaced with a solution based on dCache macaroons;
SKA: started running network tests with Australia & South Africa;
New storage dashboards for VOs;
CEPH as backend underneath dCache (v4.2): so far, performance not yet production-ready.

Virtualisation of the compute cluster: ongoing. On the same Openstack platform we're developing a high throughput compute platform, tailored for non-HEP users: the first pilots have started.
Based on LOFAR data, 26 scientific papers have been published recently. Highlights: 300,000 new galaxies have been discovered; every large galaxy seems to have a supermassive black hole, that continuously swallows matter. https://edition.cnn.com/2019/02/19/world/galaxy-discovery-lofar-trnd/index.html https://edition.cnn.com/2019/02/19/world/galaxy-discovery-lofar-trnd/index.html The LOFAR archive is the largest astronomy archive in the world. Half of it is hosted by Surfsara: the other half by FZ Jülich and PSNC Poznan.
LOFAR uses download servers to distribute data to scientists. This setup does not scale well and will be replaced with a solution based on dCache macaroons.
Project MinE, which tries to find the genetic causes of ALS (motor neuron disease), has been using macaroons now for months to give permission to collaborators to download and upload data.
Tropomi, one of our user groups, has recently produced a world map of methane in the troposphere: http://www.tropomi.eu/data-products/methane http://www.tropomi.eu/data-products/methane
SKA: we've started running network tests with Australia & South Africa.
We now have storage dashboards for VOs: http://web.grid.sara.nl/dcache.php?vo=Atlas http://web.grid.sara.nl/dcache.php?vo=Alice http://web.grid.sara.nl/dcache.php?vo=LHCb Developing something similar for compute.
CEPH as backend underneath dCache (v4.2): so far, our experience is that the performance is not yet good enough for production.
Questions: helpdesk@surfsara.nl mailto:helpdesk@surfsara.nl

New Hardware

AMD EPYC

81 DELL R6415 nodes (3 racks) with one AMD EPYC 7551P 32-Core Processor.

Two racks for NIKHEF-ELPROD compute cluster.
One rack replaces the local stoomboot cluster (more about that later).

Price/performance is good.

CPU	HEPSPEC06/core	€/core
Intel(R) Xeon(R) Gold 6148	19.57	315
AMD EPYC 7551P	14.94	247

Tried both with and without hyperthreading. Seemed to provide little benefit so turned off now.

Single socket system doesn't suffer cache coherence penalty.

Fast 3.2 TB NVMe SSDs for local storage

Grid storage

/data

Old Hitachi storage system replaced by Netapp.

Initially tried a Fujitsu Eternus DX200, but this did not work because it could not handle LDAP+TLS.
Currently a dual NetApp FAS8200 with 2× 300TB Disk and 2× 9TB SSD
Redundancy but no back-up.

/project: EMC VNX5400

This was bought for the /project storage systems; important data that has an external backup. Data is available on Unix and Windows systems, so mixed mode of Unix permissions and Windows ACLs.

Not too many vendors offer this combination. In fact, the EMC is the last of its kind to offer it.

Off-site backup

Partnership with University of Groningen led to a geographically separated off-site backup solution.

Hardware and software managed and maintained by Groningen; component replacements done by local team.

Much more economical than the previous commercial offering.

Developments

Private cloud

Setting up a high-throughput private compute cloud with Openstack.

Project was not getting much traction. Several avenues were explored; lack of long-term stability in the software development played a role.

Currently given a higher priority due to internal demand for alternatives to classic cluster computing. Nikhef as a lab is trying agile as a way of managing projects and the cloud project is likewise approached as agile.

Salt configuration management

Progress has been made to bring more systems under salt's control. LDAP, DNS, dCache.

Integrated generation of Icinga checks.

Not replacing the legacy grid services which may not be with us much longer.

dCache with distributed NFS on local cluster

High-throughput clustered local file access.

Much appreciated by users

But…see below.

Icinga for IT Services

Migration from Nagios to Icinga2
- Old Nagios managed by single person difficult to get new hosts and probes deployed.
- In the new system everyone should be able to add/edit probes for their hosts.

Standard Icinga2 install; configuration on Gitlab server; server will install and test new config automatically.

Interesting times

EMC ACL issues

Transferring ACLs from our older systems overloaded the write buffer which locked op the entire volume.

A workaround was eventually found but it took months for DELL EMC to have the correct engineer on the case and to come up with a good diagnosis and a workable solution.

The problem was reported in July, a resolution came in October.

Memory issues

A second issue involved the memory consumption of tcp NFS connections. These turned out to be significantly higher than for udp, leading to a system crash and downtime.

This issue took even longer to resolve; the initial problem was found in Spring 2018; the conclusion was that the machine was sold with the wrong specs. A solution was proposed in December which means that in April we will have a planned downtime for a system upgrade.

dCache NFS

original cluster had 8 cores/node, and 1 Gbps
new nodes have 25 Gbps and 32 cores
Possible bugs in NFS client implementation in Linux kernel
Possible bugs in distributed NFS in dCache

The result was too much traffic through the NFS door and hanging clients.

Resolution:

temporarily reinstall part of cluster with SLC6
disable dCache NFS access on other nodes
stress test installation of dCache 5
Planned upgrade in April

Before the new EPYC nodes arrived, the local cluster consisted of 92 blade servers recycled from the BiG Grid cluster installed at Philips in 2008. These were two-socket L5420 systems for a total of 8 cores, 1 Gbps connectivity per blade, 4 Gbps per chassis of 16.

These systems were recycled again by neighbouring institute AMOLF.

The replacements have 32 cores and 25 Gbps. The access pattern of the dCache disk servers changed significantly to trigger some bugs in the dCache NFS server implementation.

Honestly, this could be due to subtle differences in the interpretation of the standard; the Linux client for CentOS 7 may be to blame. But we can only address this at the dCache side.

Possibly an upgrade of the dCache system (from v3 to v5) will remedy this. Intensive testing with a test instance showed some promise, but with still some occasional failures under high loads.

Future plans

Roll out a high-throughput cloud

The vision of this is still blurry. It is hard to get a good sense from the users what they would want. Finding a representative pilot community is important.

If possible, we will virtualize the current legacy grid capacity and explore its elasticity.

CREAM on the way out

Considering alternatives:

ARC
HTCondor (adopting HTCondor for more than just batch jobs seems to have benefits)

400 GB network tests

Talk to Tristan.

DPM/dCache

Trying to minimize the legacy Knowledge maintainance (technical debt); since dCache is now used at Nikhef, having both dCache and DPM in one lab makes no sense. We're going to phase out DPM and set up dCache for grid storage.

See you in Amsterdam

HEPiX Amsterdam, 14–18 October 2019