Data storage at Nikhef comes with several options, and where to store which files requires a bit of thought and care. For example, your home directory should not be used for bulk data, and files in 'localstore' are not backed-up. Read about which type of files should go where.

Home directory

Your home directory is for personal files and configuration data used only by yourself. It is shared between the Linux systems (login.nikhef.nl, the stoomboot interactive nodes) and the windows systems.

  • intended use: your “dot” files, personal analysis results, draft versions of your thesis, simple scripts that ease your daily work, hobby projects, personal mails, communication with your supervisor or students, private images, etc. Personal things that will not be preserved after you leave Nikhef.

  • examples of data types that are better put elsewhere: your ntuples (put those in dCache or /data), scripts and frameworks used by a group of colleagues (that’s put in /project), log output you want to look at later (best put this alongside the results in /data), intermediate files (use $TMPDIR, or /localstore), your final thesis (submit it to the library, and package the plots, publications, tabular data, and histograms for submission to Zenodo)

  • What are the limitations? This file system is backed-up, and a spare copy is replicated 3 times (on disk, on a local backup, and on a backup in Groningen). The size is lomited by quota, typically 2GByte for starter-users, and a bit more for permanent folk. It is hosted on a resilient ‘NFS’ storage server, which is also used for other critical Nikhef services. The transaction rate is limited - and abusing it will quickly impact everyone. If you need more space, ask the helpdesk!

  • Did you know … that your home directory is not readable to other users, and you can use NFS4 permissions to control access. For example to sensitive data, appraisal reports, private photo’s, or letters? However, your public_html/ directory can be seen from the whole web! Unless you put .htaccess controls in place, of course.

Data in /data

This is a regular, shared, file system for storing moderately-sized (1-100 Gigabyte) data files that could in principle be re-produced (or downloaded again from the Grid or your data lake). It is ‘NFS’ mounted from a dedicated server in the data center, but the amount of transactions per second (read or write) does not quite scale to the full size of stoomboot. Use this for private ntuples and results that you use for small analyses, where you need to re-write files, or modify existing files. The amount of storage in /data is limited to several terabyte per user.

  • intended use: modestly-sized analysis results, software and containers that can be re-created or downloaded again without much effort, Singularity (AppTainer) and docker images that are used as containers, log files that you need to collate and study later. Exceptionally, /data/tunnel is designated to share data between the Nikhef local and public grid environments.

  • examples of data types that are better put elsewhere: your software and scripts (these should be in /project or your home directory), intermediate log files (these should be in $TMPDIR or /localstore)

  • What are the limitations? This filesystem is not backed up, so if the system fails catastrophically: too bad. You will have to re-create or re-download the data. The NFS server can handle quite some transactions, but abuse will impact both desktop and stoomboot users in the same way (so don’t hammer it). It is a fully ‘POSIX compliant’ file system, with permissions, so (unlike dCache storage) to can modify existing files. Out of space? Ask your group leader to discuss with the CT-PDP team.

  • Did you know … that copying data between the /data and dCache storage classes is generally useless? While their intended use is different, once the data is in either place, moving it around only wastes resources. The only exception is if you want to share your results outside of Nikhef (since stoomboot dCache can be accessed with xrootd or gridftp protocols, and /data cannot) - and then only when data is actually being re-written during its creation (otherwise, just write to dCache immediately).

dCache

The dCache store can hold your largest data files. It’s a distributed storage system, designed to cope with the load of Stoomboot running on all cylinders (or nodes, in this case). Access looks like a normal file system (/dcache/...), but once written a file cannot be modified - only re-written in full. It’s the perfect place for analysis results, your ntuples, and MC samples.

  • intended use: large scale data files, files access from many (stoomboot) systems at once

  • examples of data types that are better put elsewhere: your software and scripts (these should be in /project or your home directory), intermediate log files (these should be in $TMPDIR or /localstore)

  • What are the limitations? This filesystem is not backed up, so if the system fails catastrophically: too bad. You will have to re-create or re-download the data. Also, dCache files cannot be modified (or appended to), and are written ‘in one go’, but it still looks like a regular filesystem. Out of space? Ask your group leader to discuss with the CT-PDP team.

  • Did you know … that files in dCache can also be read remotely through the GridFTP, WebDAV, and xrootd protocols? Register your certificate distinguished name for access.

Project

The project storage (/project, a.k.a. /global) is intended for unique software, configuration, settings, conditions, and plots that are precious, and should outlive your personal stay at Nikhef. The histograms and tabular data that form the input to the plots in your thesis find a good place here as well whitel you are working with them. Also the scripts and Jupyter notebooks used to create them are welcome here.

  • intended use: unique software, your final thesis, settings and conditions data, precious plots, histograms and tabular data that make them (remember to deposit notebooks, tabular data, and results also in a FAIR data repository like HEPdata, CERN’s Open Data, or Zenodo).

  • examples of data types that are better put elsewhere: personal data, like your photos or web home page (these must go into your home directory, if only to preserve your own privacy!), analysis results (these can be reproduced, hence should go to dCache or /data), things you want to share with people outside Nikhef (these should go to SURFdrive, specifically a SURFdrive Group folder), intermediate results (these should go to $TMPDIR or /localstore), merger data or files needed to share sthings between running processes (these should go to /data), large files (these should likely be in /data, and also be replicated to the Grid or your experiment data lake to have a spare copy).

  • What are the limitations? Like the home directory, this file system is backed-up, and a spare copy is replicated 3 times. That’s expensive, and space is limited. The quota’s here are ‘group quota’, so if you fill it, your direct colleagues will suffer. It is hosted on a resilient ‘NFS’ storage server, which is also used for other critic al Nikhef services.

Local cache storage

All systems have a local ‘scratch’ directory. While you think this is /tmp/, it may not be where you expect, so always use $TMPDIR to find it. On stoomboot nodes for example, you have a scratch directory that is specific for your job, and you have a fresh one for each job. Using $TMPDIR as a pointer to it ensures you’re always in the right spot. Want more resilience or use the same script on your laptop and stoomboot? Use ${TMPDIR:-/tmp}. On desktop systems, there’s also /localstore that you can use.

  • intended use: temporary and intermediate results. Output from MC jobs that will be merged in larger files before being written to dCache (or /data)

  • examples of data types that are better put elsewhere: resulting data files and ntuples that should be used in subsequent analysis (put those in dCache or /data, sincve localstore is neither backed up nor resilient, and TMPDIR is cleaned up once the compute job completes), private data (put that in your home directory), your personal web pages (in your home directory or web space), scripts, code, and software (put those in /project if its generally useful, or your home directory otherwise).

  • What are the limitations? This storage is only available on one system, and not shared at all. It’s blazingly fast, large, can be re-written at will, but … is ephemeral and fragile.

SURFdrive

The SURFdrive ‘sync and share’ solution offers cloud-based file storage for personal and group files. Every employee of Nikhef (so including PhDs) gets a personal quota of 500GByte, and you can sync it to your laptop, phone, desktop, or home system anywhere in the world. In consultation with your group or project team, you can also create ‘group folders’ (also 500GByte) that is not linked to your personal account.

  • intended use: sync-and-share cloud storage, personal files that you sync between multiple devices (phones, laptop, desktop) or would like to share with colleages opr the public. Group folders can be used to share documentation (drawings, documents) that outlive your stay at Nikhef

  • examples of data types that are better put elsewhere: large (>1Gbyte) files (the webdav sync protocol re-writes the whole file if it has changed), lots of small files (such as Git repositories, since the transaction rate is limited), typical unix web files (it does not like files that start with .ht*)

  • What are the limitations? The 500GByte is not extensible, and there is no guaranteed backup for personal files. It’s sync-and-share, so the assuption is that a copy will still exist locally somewhere.

FileSender

SURF Filesender can be used to exchange both large and small files with anyone in the world through email. This is real ephemeral storage, with a life time of up to 14 days. It can also be used to receive files, through generation of vouchers you can share with your correspondents.

  • intended use: sharing small and large files via email, replacing the use of attachments, or receiving large files through vouchers. Sharing of sensitive and personal data through encrypted transfers.

  • examples of data types that are better put elsewhere: files that you want to keep and preserve - this is not permanent storage, just transfer. Also, if you anticipate changign the file after you send the link, use SURFdrive sharing to a (password protected) sharing URL.

  • What are the limitations? Files will remain stored only for a limited time, with a maximum of 14 days. Then the file is irrecoverably deleted.

  • Did you know … that you can send really large files, tens of Gigabytes in size?