VoCacheDir

From PDP/Grid Wiki
Jump to navigationJump to search

The VO Cache Directory (VOCD) is a site service to allow users in a VO to cache data on a system (worker node or site) in a location where it may remain after the user's job has finished. It is intended to improve the performance and reduce network load where many jobs of the same user, or the same VO, intensively use the same worker node or site.

The VOCD is intended solely for incoming data. It is NOT meant to store any output data there, nor as a means to share data between jobs. The size of the VOCD WILL be limited, and MAY be as small as 0 (zero) bytes. Users of the VOCD MUST be able to deal with situations where the cache is full, or where the remaining free space is smaller than the to-be-cached file.

Site Installation

Please see the dedicated VoCacheDirUtility page.

Locating the VO Cache Directory

The VO cache directory may be located anywhere on the file system. An environment variable "VO_voname_CACHE_DIR" will be set by the site in the job's environment to point to this directory. voname is the registered name of the VO in the VOID card or site-local equivalent thereof. If VO_voname_CACHE_DIR is NOT defined, but VO_CACHE_DIR IS defined, then the directory pointed to by VO_CACHE_DIR MAY be used to cache VO data.

The VO Cache Directory MAY be local to the worker node or MAY be located on a shared file system.

Persistency semantics

The site DOES NOT guarantee any particular life time of the files stored in the VO Cache Directory. Files in the VOCD are treated as independent units and MAY be removed at any time.

The VO Cache Directory is a file oriented cache. This means that each file in the Cache is assessed individually, and files may be retained or removed on a file per file basis. No assumptions must be made that files put in the cache at the same time, or accessed at the same time, will have the same life time.

Storing data in the cache

The Cache Directory SHOULD be writable by all users in the VO, and MAY be writable by anybody in the world. The VOCD will be 'sticky', meaning that files written by user "A" CANNOT be removed by user "B" (like /tmp). Since the VOCD is writable by more than one user, processes loading data into the cache MUST protect against common race conditions.

  1. do not write to existing file, but use mktemp(1) to generate a unique file name
  2. write the incoming data to this unique file name
  3. use mv(1) to move the new data to its final name

For example:

targetname=atlas.ch-DBRelease-0.9a.sql
# file shoud not already exist, or downloading is pointless
[ -f ${targetname} ] && exit 1
tmpnam=$(mktemp -p $VO_ATLAS_CH_CACHE_DIR mydownload.XXXXXX)
[ $? -ne 0 ] && exit 1
wget -O ${tmpnam} https://dbdump.atlas.ch/DBRelease-0.9a.sql
if [ -f ${tmpnam} ]
then
 mv ${tmpnam} $VO_ATLAS_CH_CACHE_DIR/${targetname}
 rc = $?
 if [ $rc -ne 0 ] 
 then
   echo "Move of $tmpnam to $VO_ATLAS_CH_CACHE_DIR/${targetname} failed: $rc" >&2
   #rm ${tmpnam}
   exit 1
 fi
else
 echo "Download failed" >&2 
 exit 1
fi

Using data in the cache

Since the cache MAY be used by more than one user, the user's application SHOULD make sure that a file in the cache with a particular name actually corresponds to the desired file. Uniqueness of names MUST NOT be assumed to be guaranteed by the site.

Once opened, the files in the VOCD will obey the Unix semantics for the accessibility for open files. This means that as long as a file remains opened by a process, the process that has the file open will be able to read the contents of the file, but once the process closes the file, the content will be removed as well. The directory reference to the file MAY be removed at any time, so closing the file and re-opening it by name will not work.

The user MAY assume that a file successfully stored in the cache will persist for a few minutes, but this is NOT guaranteed.

The user MUST NOT assume that the jobs $TMPDIR directory, the VO Cache Directory or the user's home directory share the same file system.

It MAY be that a file is removed from the cache after starting your application, but before you open it. it is RECOMMENDED that any file is opened as soon as possible after starting the application and is kept open for as long as access to any part of the file is needed.