[Go to /grid/lcaslcmaps/]

OS Interoperation

mkgltmpdir example pattern

Manual pages


Privacy notice

gLExec Operating System Interoperability

Batch systems in the presence of gLExec

gLExec attempts really hard to be neutral to its OS environment. In particular, gLExec will not break the process tree, and will accumulate CPU and system usage times from the child processes it spawns. We recognise that this is particularly important in the gLExec-on-WN scenario, where the entire process (pilot job and target user processes) should be managed as a whole by the node-local batch system daemon.

We have verified that, on the Torque batch system, the forking of a process with a different uid does not impair the functioning of pbs_mom in being able to kill any and all of the child processes. We tested this with Torque version 2.1.6; to verify this with your own batch system you can use the steps below.
The (simple) program will do the exact same uid change that glexec does, but does not require that you install anything grid-like on your site. It's a completely stand-alone program that does a uid change, so you can test how your batch system reacts.

  • download the code for the sUTest programme and compile it on your system: sutest.c. We on purpose do not provide a pre-compiled binary for this, as you need to configure two pre-defined constants in the source that are site-specific:
        #define UNOBODY 99
        #define GNOBODY 99
        #define SRCUID 502
    specify the uid and gid numbers to switch to, as well as the uid (numeric) of the user account you will use for testing (i.e. the account that will do the batch system qsub). This must be a trusted uid as that user will have effective super-user privileges at any time.
  • Compile this program:
        cc -o sutest sutest.c
  • Copy this program to a worker node, and make it setuid root:
        cp sutest /usr/local/bin/
        chown root:root /usr/local/bin/sutest
        chmod u+s /usr/local/bin/sutest
    (make sure your batch job goes to this worker node, please refer to your batch system manual for details).
  • Submit a batch job that takes a while:
        echo "/usr/local/bin/sutest sleep 600" | qsub 
    (make sure your batch job goes to the worker node with the setuid sutest program, please refer to your batch system manual for details).
    Check on the worker node to see if the sleep job is running, and as who (usually: the nobody user).
  • As soon as the job starts, kill it (in PBS/Torque that would be a "qdel"), and look on the worker node to see if the sleep process goes away. It should.
  • Remove the setuid bit from the sutest executable on the WN:
        chmod 0755 /usr/local/bin/sutest
If you notice any anomalies after testing, i.e. the job will not die, please notify the developers at grid dash mw dash security at nikhef dot nl. If you run this on Torque 2.1.6 and notice any issues, please repeat your tests...
Comments to grid-mw-security@nikhef.nl