The intent of the script is to remove stray user processes that
are (inadvertently or deliberately to escape accounting) left
around after a batch job completes on a worker node. The script
is applicable also to multi-job systems where multiple jobs from
the same user may run concurrently.
The script lookup the sid of the initial process started by the pbs mom
(using momctl -d 0), and then traces the process tree of that process using
pid-ppid matching. User processes with uid>99 that are not started by ssh*,
and are not in the process tree of any current batch job are then killed.
This means all jobs who are (grand) children of init, instead of being
(grand)children of a valid pbs_mom job, are killed.
Notes:
- Please note that the script does not reliably protect the machine
from the fork bombs -- a race condition in the current version may
cause them to survive the pruning script.
- If the farm is running MPI tasks, then the script should be invoked
from the epilogue.parallel instead of the plain epilogue script.
- Normally invoked as: prune_userprocs -a -k 9.