[Go to /grid/]
Information
About the NDPF
Grids at Nikhef
Outside Access
NDPF Status
Support

Meetings
CHEP2013

Grid Guides
LCG Tutorial
LCG-2 Users Guide
eScience Certificates
Certificate Guide

Facilities
Systems
Statistics
NIKHEF Network
NIKHEF Grid Wiki

Accounting NDPF (restricted access)
NL-T1 Alarms (restricted access)

Engineering & Research
VL-e PoC
Grid Trigger
LCAS/LCMAPS
gLExec
System Utilities
BiG Grid
Authentication
NIKHEF OID Registry
Open Code Repository
Files Repository

Local
Support Management
Photo Gallery 1
Photo Gallery 2

Prune User Processes on PBS, Torque Batch Nodes

All files can be downloaded from this directory.
Prune Userproc - remove stray processes after a batch job completes

Description:

The intent of the script is to remove stray user processes that are (inadvertently or deliberately to escape accounting) left around after a batch job completes on a worker node. The script is applicable also to multi-job systems where multiple jobs from the same user may run concurrently.

The script lookup the sid of the initial process started by the pbs mom (using momctl -d 0), and then traces the process tree of that process using pid-ppid matching. User processes with uid>99 that are not started by ssh*, and are not in the process tree of any current batch job are then killed. This means all jobs who are (grand) children of init, instead of being (grand)children of a valid pbs_mom job, are killed.

Notes:
- Please note that the script does not reliably protect the machine from the fork bombs -- a race condition in the current version may cause them to survive the pruning script.
- If the farm is running MPI tasks, then the script should be invoked from the epilogue.parallel instead of the plain epilogue script.
- Normally invoked as: prune_userprocs -a -k 9.

More system utilities can be found on the NDPF Wiki pages and one level up.
Comments to David Groep.