Experiments in high-energy physics often produce extremely large volumes of data. Per year the LHC experiments produce around ten petabytes (1 petabyte is one million gigabytes). For comparison, this is about the amount of information that is contained in the entire collection (printed plus digital) of the U.S. Library of Congress (the de facto national library of the US).
These data are accessed, processed, and analyzed by tens of thousands of researchers located all over the world. This data processing and analysis requires close to one hundred thousand computer processors, used around the clock. The size of this computing system, and the spread of the users around the world, led to a decision to build a distributed computing system. Such a system, distributed across many different institutes, countries, and even continents, shared by many users belonging to different scientific experiments, is what is called a "computing grid".
Nikhef was one of the original five institutes in the LHC computing grid (just as it was for the original WWW!). Nikhef is also a founding partner of the Dutch national grid project "BiG Grid", which aims to serve not only high energy physics, but ideally all Dutch sciences that have large-scale (or distributed) computing needs.
The Nikhef PDP (Physics Data Processing) group carries out Nikhef's grid computing program. The group has specific expertise in several areas:
1. Facility operations. Nikhef operates a large grid-computing site comprising as of January 2011, of one petabyte of storage and 2500 processors. This facility is funded by the BiG Grid project, and a large share of it forms part of the Dutch Tier-1 site for LHC experiments, one fifth is dedicated to (and used by) other groups such as the eNMR collaboration, who use the computers at Nikhef to compute structures of protein molecules based on NMR data. Nikhef has expertise in facility management through a longstanding involvement in the
Quattor collaboration.
2. Application of grid computing in scientific research. In addition to our "own" high-energy physics users, the PDP group works with users in other sciences in connecting their computing systems to the BiG Grid computing infrastructure.
3. Security in distributed computing. Such a large scale, high performance infrastructure must be well protected both to keep the bad guys out as well as to ensure that the right users get the right high-performance systems. Nikhef staff are involved both in designing the security policies, as well as in implementing these policies and methods in software.
4. Petascale distributed data management. Since the LHC machine has started taking data, it has become clear just how difficult it is to distribute such large quantities of data in a way that minimizes required space and at the same time maximizes the users' ability to rapidly access it. Researchers at Nikhef are working to solve this problem, both for our "own" high-energy physicists as well as other groups using the grid.
The grid computing website can be found here.