NDPFDirectoryImplementation

From Gridwiki

Table of contents

The NDPF Directory

The NDPF Directory serves unix logins on the farm, access to the Subversion repository, and can act as the identity management system for NDPF users. It supports a wide range of attributes, of which the pam/nss ones are only a subset.

The full implementation of advanced access control (based on directory groups and authorizedServices) in the NDPF is still on-going. Make sure that until the migration has been completed no 'untrusted' users are entered into the directory.

Editing the directory

The best tool I've found for managing an LDAP directory is Jarek Gawor's 'LDAP Browser\Editor', written when we worked at ANL and the University of Chicago. The last version I have is 2.8.2b. It's a pure-Java implementation and runs on Linux, Windows, and MacOS. Nikhef users can find a local copy at http://www.nikhef.nl/grid/ndpf/files/LDAPBrowser/Browser282b2.zip

The main page is at http://www.mcs.anl.gov/~gawor/ldap/index.html, and useful FAQs are answered at http://www.openchannelfoundation.org/project/view_faq.php?group_id=9

In the https://www.nikhef.nl/grid/ndpf/files/local/packages/ directory you can also find the ndpfuseradd package, containing user management and integration tools.

NSS and PAM Implementation details

All machines that are connected to the LDAP directory (have to) use the authorizedService and directoryGroups mechanism to control and limit access to authorized Users.

Setting user attributes

Each 'uid' entry (that is, each user) also has a few attributes that define his or her capabilities.

authorizedService 
a multi-values attribute, stating which service a user is allowed to use. For example, "sshd" must be asserted if the user is to be able to login to any system, anywhere. Without "sshd", ssh login will work no-where. Similarly, "login" must be asserted for those people who need to login to a console.
host (not used yet) 
a multi-valued attribute stating which hosts a user is allowed access to. This is a limitative list that does not support substring wild-cards or regexps, thereby severy restricting its usefulness. So for now not used.

Common combinations

standard user of ikohefnet 
authorizedService=sshd, login, smtp, sieve, pop, imap
a user allowed to run cron jobs
add "crond"
remote user that uses only our subVersion service
authorizedService=svn (and he/she must be in the ndpfSubversionUsers directory group to browse the web repository). For read/write access to specific SVN repos, you must also edit the SVN authorizaiton files.

Also, have a look at the CT Wiki (http://www.nikhef.nl/nikhef/departments/ct/wiki/index.php/Main_Page) for more information.

Limiting access

Contrary to the CERN setup, we have for the time being decided to NOT modify /etc/pam.d/system-auth, and thus we are stuck with three controls in ldap.conf:

pam_check_service_attr
honour the authorizedService attribute of a user
pam_check_host_attr 
honour the host attribute of a user (pretty useless)
pam_groupdn 
restrict access to the specific directory group (where the LDAP directory group contains DNs of the people that will be allowed to login, NOT the posixAccount "group" attributes!).

The pam_groupdn restriction is the most useful, but unfortunately there can only be one (1) group definition, and groups cannot be nested. So, you will have to create several groups with overlapping members. Of course, the selection is AND-ed with the authorizedService attribute when pam_check_service_attr is set to "yes".

These three configuration directives can be set via ncm-authconfig as of version 1.1.14.

A couple of directoryGroups have been defined:

nDPFInteractiveUsers 
all users that need interactive login on farm systems
SystemAdministrators 
Us
nDPFSubversionUsers 
users that can use svn via the https interface
gridSrvAdministrators 
system adminstrators of the protected grid service machines
gridSrvInteractiveUsers 
users that can login to general protected grid service systems

more will follow (likely nDPFServiceAdministrators, iGTFTrustedCommitters, etc.)


Pool Accounts

Pool accounts are of the account structural object class, as they don't have personal information (and thus making them an inetOrgPerson or person does not make sense). These are named according to their uid, which is also part of the account objectClass.

Note that "account" and any "person" derived classes are incompatible (this thus also holds for organizationalPerson and inetOrgPerson)!


Example configurations

First of all, enable LDAP through authconfig (via NCM or authconfig-tui), and make sure that "Local authorization is sufficient" is set to OFF (that's the default anyway), or anybody can get in.

Then, install the DutchGrid CA certificate from the RPM:

 http://www.nikhef.nl/grid/ndpf/files/packages/local-cas/DutchGridCA-1.0-2.noarch.rpm

Then, for a grid service node, put in /etc/ldap.conf:

base dc=farmnet,dc=nikhef,dc=nl
timelimit 120
bind_timelimit 5
idle_timelimit 3600
nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman
uri ldaps://teugel.nikhef.nl/ ldaps://hooimijt.nikhef.nl/ ldaps://vlaai.nikhef.nl/
ssl on
tls_cacertdir /etc/openldap/cacerts
pam_password md5
pam_check_service_attr yes
pam_groupdn cn=gridSrvInteractiveUsers,ou=DirectoryGroups,dc=farmnet,dc=nikhef,dc=nl

Managing the Directory

WARNING: the DutchGridCA, nikhef-directory-schema, and ndpf-slapd-acls RPMs on stalkaars-01 and stalkaars-03 are installed via Quattor. Copy the RPMs to stal:/project/quattor/www/html/mirror/nikhef/, and update the corresponding file grid/common/openldap/rpms/el5_x86_64/server.tpl. Then:

( cd $L ; ant update.rep.templates )
pushxprof -f generic stalkaars-01 stalkaars-03

Masters and Slaves

The directory system itself consists of a single master (teugel), with any number of slaves managed via syncrepl. Currently, there are four slaves:

hooimijt.nikhef.nl
vlaai.nikhef.nl
stalkaars-01.farm.nikhef.nl
stalkaars-03.farm.nikhef.nl
gierput.nikhef.nl

who can read all of the directory at teugel (needed for replication) by virtue of being the "ou=Managers" DIT. All entries that have a CN starting with "syncrepl-" can read all of the entries in the directory, but cannot write to any of them! This way, marginal integrity of the directory is preserved in case of the the slaves fails.

Only OpenLDAP 2.3.x and better can be used for this scheme (so RHEL5 and above). Do not introduce RHEL4/OpenLDAP 2.2 systems in the fabric, as they appear to work initially but will die on the first update of any content in the directory.

Required packages

All servers should have the full schemata installed. These are partially available by default after you install openldap-servers (using Quattor or Yum), but you need to install also the Nikhef specific schemata (actually, just eduPerson, SCHAC, nss/pam and the ldapPublicKey one for ssh keys). These schemata are installed as an RPM from

http://www.nikhef.nl/grid/ndpf/files/packages/directory-schemata/

and are generated based on sources in SVN (/pdpsoft/trunk/nl.nikhef.ndpf.tools/nikhef-directory-schema). You must install this schema file on all slaves as well, and include the definitions by inclusion in the /etc/openldap/slapd.conf file, in addition to the schemata already there:

include         /etc/openldap/schema/ldapns.schema
include         /etc/openldap/schema/eduperson.schema
include         /etc/openldap/schema/schac.schema
include         /etc/openldap/schema/openssh-lpk.schema

Enabling ldaps and tls-ldap

First, install the DutchGrid CA RPM (from the NDPF packages repository using yum if you want):

DutchGridCA-1.0-2.noarch.rpm

and request a host certificate and edit the slapd.conf file. Then edit the slapd.conf file and add:

TLSCACertificateFile /etc/openldap/cacerts/16da7552.0
TLSCertificateFile /etc/pki/tls/certs/usercert.pem
TLSCertificateKeyFile /etc/pki/tls/certs/userkey.pem

Access control and slapd.conf

All servers (masters and slaves alike) share the same set of ACLs. In order to make these manageable, they are stored in a separate file (/etc/openldap/slapd-acls.conf) that is then included in the main /etc/openldap/slapd.conf file. Edit this file again, and :

include /etc/openldap/slapd-acls.conf

You will have to install this file using an RPM. This RPM is created (with "make rpm", obviously) from the sources in SVN. Install it like this, and please use the latest version:

rpm -Uvh /home/davidg/src/ndpfsvn/ndpf/nl.nikhef.ndpf.tools/ndpf-slapd-acls/ndpf-slapd-acls-1.0-1.noarch.rpm

but it's best to check out and build it again if needed. Make sure you update the 'config.mk' release number if you change the ACL.

For the LDAP slave serving the (semi-public) phonebook, the additional acl slapd-acls-public.conf may be added to the slapd.conf file before the other ACL include directive, but is must never be used as the only ACL!

Configuring the database and server parameters

To ensure that you get a reasonable (if not astounding) performance out of the server, and you see all entries, add the following database and server definition to the slapd.conf file:

sizelimit       100000
timelimit       3600
threads         32
idletimeout     600
loglevel        0

and a proper database definition, part of which may already be there:

database        bdb

suffix          "dc=farmnet,dc=nikhef,dc=nl"
#rootdn          "DONOTUSE"
#rootpw          "DONOTUSE"

directory       /var/lib/ldap

cachesize 100000
dbcachesize 1024000

# Indices to maintain for this database
index objectClass                       eq,pres
index ou,cn,mail,surname,givenname      eq,pres,sub
index uidNumber,gidNumber,loginShell    eq,pres
index uid,memberUid                     eq,pres,sub
index nisMapName,nisMapEntry            eq,pres,sub
index uniqueMember                      eq,pres
index schacUserStatus                   eq,pres
index schacExpiryDate                   eq,pres
index authorizedService,host            eq,pres
index mailacceptinggeneralid            eq,pres,sub
index maildrop                          eq,pres,sub
index mailacceptinguser                 eq,pres,sub
index macAddress                        eq,pres

index entryCSN,entryUUID eq

and put a file DB_CONFIG in /var/lib/ldap containing:

set_cachesize 0 200000000 10
set_lg_regionmax 262144
set_lg_bsize 2097152

and then (re) start the ldap server with

service ldap restart
chkconfig ldap on

Creating the master database is beyond the scope of this document, for recovery see below.

Additional master configuration

A master also has a syncrepl provider stanza:

overlay syncprov
syncprov-checkpoint 100 1
syncprov-sessionlog 100

Additional slave configuration

A slave also has a syncrepl stanza:

syncrepl rid=<UNIQUENUMBER>
       provider=ldaps://teugel.nikhef.nl:636
       type=refreshAndPersist
       interval=00:00:01:00
       retry="20 3 60 5 120 10 300 10 600 +"
       searchbase="dc=farmnet,dc=nikhef,dc=nl"
       scope=sub
       schemachecking=off
       bindmethod=simple
       binddn="cn=syncrepl-<MYNAME>,ou=Managers,dc=farmnet,dc=nikhef,dc=nl"
       credentials=<MYOPENSSLGENERATEDPASSWORD>

updateref ldaps://teugel.nikhef.nl

The last line will refer clients that want to change the directory (such as the "passwd" command!) to the proper master.

For a slave, the DB_CONFIG file usually holds:

set_cachesize 0 300000000 10
set_lg_regionmax 262144
set_lg_bsize 2097152
set_flags DB_TXN_NOSYNC
set_flags DB_LOG_AUTOREMOVE

since data base recovery can be done from the master (wipe all contents of /var/lib/ldap except for DB_CONFIG and start ldap again).

In case the slaves servce a lot of clients (and do a lot of tcpd control), you may want to raise the max open files limit in /etc/sysconfig/ldap by adding

ULIMIT_SETTINGS="-n 8192"

Backup

A daily backup of the directory contents to file is made on the master server. In /etc/cron.daily there is this script:

#! /bin/sh
#
# @(#)$Id$
#
DATE=`date '+%Y%m%d.%H%M%S'`
DIR=/project/userdb-backup/data
CONFIG=/etc/openldap/slapd.conf

file="$DIR/farmnet.nikhef.nl-$DATE.ldif.gz"

touch $file
chmod 0600 $file
slapcat -f $CONFIG | gzip -c > $file
chmod 0600 $file

which will work because updates to the directory are so infrequent that taking a backup of a running database is safe.

A daily NDPF_rsync_backup is made from teugel to beerput, and from there via ADSM to tape.

Audit trail

With the following directives in slapd.conf auditing can be enabled. This must be done on teugel (and is useful on teugel only, actually). To enable it, install

openldap-servers-overlays-2.3.43-3.el5

and add to slapd.conf

overlay auditlog
auditlog /project/userdb-backup/audit/auditlog

Restore

Take your pick from the following scenarios:

Directory data on teugel is corrupt

The server itself and the disks are still ok, but the data are wrong or the daemon does not start

  • Stop the slapd daemons (service ldap stop)
  • Remove all actual data from /var/lib/ldap/ on teugel, leaving the DB_CONFIG file intact
  • Make sure the directory is owned by user ldap:ldap and mode 0700
  • make sure the /etc/openldap/slapd.conf file is correct
  • restore the data from the backups in /project/userdb-backup/data/ using the commands
gunzip /project/userdb-backup/data/farmnet.nikhef.nl-YYYYMMDD-hhmmss.ldif.gz
slapadd -f /etc/openldap/slapd.conf -l /project/userdb-backup/data/farmnet.nikhef.nl-YYYYMMDD-hhmmss.ldif
  • restore the permissions on all files in /var/lib/ldap. All files must be owned by ldap:ldap and be writable by it, except for the DB_CONFIG file
  • Restart the slapd daemons (service ldap start)

Teugel is dead and disk lost

Resurrect a brand new LDAP directory service on a machine called 'teugel' (this name is used by the slaves to synch from), using the recipe and the RPMs listed above in the installation section. Then, restore the data:

  • Copy the latest knwon good file from the backup server beerput, located in /export/data/backups/teugel.nikhef.nl/project/userdb-backup/data/farmnet.nikhef.nl-YYYYMMDD-hhmmss.ldif.gz
    • if the server beerput is also dead, reinstantiate the backup server, install ADSM with the proper password, and restore the files using the client /opt/tivoli/tsm/client/ba/bin/dsmc.
    • backup files on tape may be up to two days older than current

Then, continue with the steps in Directory data on teugel is corrupt.