Nikhef | Physics Data Processing

gLExec - a grid enabled account mapping utility

gLExec is a program to make the required mapping between the grid world and the Unix notion of users and groups, and has the capacity to enforce that mapping by modifying the uid and gids of running processes. For a service running under a .generic. uid, such as a web services container, it provides the way to escape from this container uid. It may be used similarly by externally managed services run on a site.s edge. Lastly, in a late-binding scenario, the identity of the workload owner can be set at the instant the job starts executing.

The description, design and caveats are described in the paper.

Local services, in particular computing services offered on Unix [5] and Unix-like platforms, use a different native representation of the user and group concepts. In the Unix domain, these are expressed as (numeric) identifiers, where each user is assigned a user identifier (uid) and one or more group identifiers (gid). At any one time, a single gid will be the .primary. gid (pgid) of a particular process, This pgid is initially used for group-level process (and batch system) accounting. The uid and gid representation is local to each administrative domain.

Batch systems and gLExec in a gLExec-on-the-Worker-Node scenario

gLExec attempts really hard to be neutral to its OS environment. In particular, gLExec will not break the process tree, and will accumulate CPU and system usage times from the child processes it spawns. We recognise that this is particularly important in the gLExec-on-WN scenario, where the entire process (pilot job and target user processes) should be managed as a whole by the node-local batch system daemon.

You are enouraged to verify OS and batch system interopeability. In order to do that, you have two options:

Comprehensive testing: Ulrich Schwickerath has defined a series of (partially CERN-specific) tests to verify that glExec does not break the batch system setup of a site. He has extensively documented his efforts on the Wiki at https://twiki.cern.ch/twiki/bin/view/FIOgroup/FsLSFGridglExec.
Note that the Local Tools section is CERN-specific. If you use other tools to clean up the user's work area (such as the $tmpdir facility of PBSPro and Troque), or use the PruneUserproc utility to remove stary processes, you are not affected by this.
Basic OS and batch-system testing can be done even without installing glExec, but just compiling a simple C program with one hard-coded uid for testing.
This is the fastest solution for testing, but only verifies that your batch system reacts correctly, not that your other grid-aware system script will work as you expect.

The following batch systems are knwon to be compatible with gLExec-on-the-Worker-Node:

Torque, all versions
OpenPBS, all versions
Platform LSF, all versions
BQS, all versions
Condor, all versions

If you notice any anomalies after testing, i.e. the job will not die, please notify the developers at grid dash mw dash security at nikhef dot nl.

Deploying gLExec on the Worker Node

The preferred way to deploy gLExec on the worker node is by using (VO-agnostic) generic pool accounts that are local to each worker node. This way, you can be sure that a gLExec'ed job does not "escape" from the node, and it limits the number of pool accounts needed. For this configuration, you

create at least as many pool accounts as you have job slots on a WN
assign a worker node local gridmapdir (suggestion: /var/local/gridmapdir)
create local pool accounts with a local home directory (suggestion: account names wnpool00 etc, and home directories in a local file system that has enough space, e.g., /var/local/home/poolwn00, etc.)
configure the lcmaps.db configuration used by glexec to refer to this gridmapdir

Note that the /var/run/glexec directory is used to maintain the mapping between the target and the originator account for easy back-mapping for running jobs. This information is of course also logged to syslog(3).

If you really like shared pool accounts, you can use a shared atomic state database (implemented as an NFS directory) to host the gridmapdir. All operations on the gridmapdir are atomic, even over NFS, and it scales really well (remember that NFS is the file sharing mechanism of choice for many large installations of 100 000 nodes and over... just ask any corporation selling storage appliances :-)

Background information

A description of the background of glExec is available in the CHEP2007 proceedings. Since apparently the CHEP reviewing and publishing is even more delayed than LHC startup, a DRAFT version of a longer paper is here.

Comments to Physics Data Processing group.

LCAS and LCMAPS

gLExec - a grid enabled account mapping utility

Batch systems and gLExec in a gLExec-on-the-Worker-Node scenario

Deploying gLExec on the Worker Node

Background information