System Utilities from NIKHEF

All files can be downloaded from this directory.
VO-ENV - generating environment variable definitions for VOs

Description:

lcg-vo-env is a program to generate the environment variables definitions needed for locating VO specific resources. The idea is that lcg-vo-env is evaluated in-line by the wrapper script that is submitted to the com- pute resource. It will set the relevant environment variables defined for the specified VO, within a namespace defined by the user.

Download(s):

MoniFarm - graphing farm occupancy for a PBS/Torque LRMS

Description:

MoniFarm is a script to collect data from a local batch system and from a local CE GRIS (Glue Schema v1.2) and store the results in a Round Robin Database (RRD) using the rrdtool programme from Toby Oetiker. Examples of what MoniFarm does can be seen at the NDPF Statistics Pages (http://www.nikhef.nl/grid/stats/ndpf-prd/), which are based on MoniFarm.

PBS Caching Utils - output caching of qstat results

Notice: these utilities are usually only needed for Torque v1 or OpenPBS

Description:

Some software, such as the LCG M/W, can put severe load on a pbs server due to the times it calls qstat (we have seen as many as 25 qstat calls per second). This load can eventually bring the entire system to an halt. In order to reduce the impact of this problem (a full solution requires rewriting part of the grid job manager), Davide Salomoni wrote a set of utilities that wrap the original pbs commands and provide caching.

Prune Userproc - remove stray processes after a batch job completes

Description:

The intent of the script is to remove stray user processes that are (inadvertently or deliberately to escape accounting) left around after a batch job completes on a worker node. The script is applicable also to multi-job systems where multiple jobs from the same user may run concurrently.

The script lookup the sid of the initial process started by the pbs mom (using momctl -d 0), and then traces the process tree of that process using pid-ppid matching. User processes with uid>99 that are not started by ssh*, and are not in the process tree of any current batch job are then killed. This means all jobs who are (grand) children of init, instead of being (grand)children of a valid pbs_mom job, are killed.

Notes:
- Please note that the script does not reliably protect the machine from the fork bombs -- a race condition in the current version may cause them to survive the pruning script.
- If the farm is running MPI tasks, then the script should be invoked from the epilogue.parallel instead of the plain epilogue script.
- Normally invoked as: prune_userprocs -a -k 9.

More system utilities can be found on the NDPF Wiki pages.