next up previous contents
Next: Standalone Test Runs Up: ZEUS CALDAQ Transputer System Previous: BOR Files   Contents

Online Log File

The CALDAQ transputer network logfile (named iserver.log, which can be found in directory $\:\tilde{}$calec_rc/log) contains mostly messages of the format:

$<$sender-name$>$: $<$message$>$

in which sender-name is the name of the process, routine or transputer sending the message.



A complete description of the messages present in the logfile is not available...; why and where the messages are generated can be found in the source code and is mostly a matter of concern to the code expert. However a few message types will be explained here in more detail.



Messages starting with '###' signify that something serious or possibly fatal occurred.



Error messages originating from the CSBs (name of the message sender in this case: CSBREPORT) are worth noting; some of these messages are accompanied by a more explanatory message on the RunControl screen. Here is a list of these messages:

$\diamond$
ARE1.ERROR mask=$<$n$>$
ARE2.ERROR mask=$<$n$>$
ARE3.ERROR mask=$<$n$>$
CSB received an interrupt from ARE1-, ARE2- or ARE3-board respectively, with the source(s) of the interrupt in bitmask n, meaning that the connected transputer(s) set the error flag (but did not necessarily halt, if the code was compiled in undefined mode, e.g. the CAL-SLT code); which transputer is connected to which ARE-connection can be found in chapter 'CSB Connections',
$\diamond$
could not send NEXT TRIGGER
serious problem with process that controls NEVIS frontend control electronics (on HOST-transputer, via Serial Cards); it does not accept the command to generate the next trigger (in standalone runs),
$\diamond$
CSB.MESS.FAIL dest=$<$n$>$
the CSB failed to pass on a message to the next CSB (destination id = n),
$\diamond$
CSB.UNKNOWN.COMMAND cmd=$<$n$>$
the CSB received an unknown command n from HOST,
$\diamond$
CSB.UNKNOWN.DESTINATION dest=$<$n$>$
the CSB received an unknown destination transputer number n from HOST,
$\diamond$
EVT1.EVENT mask=$<$n$>$
EVT2.EVENT mask=$<$n$>$
EVT3.EVENT mask=$<$n$>$
CSB received an (unexpected) interrupt from EVT1-, EVT2- or EVT3-board respectively, with the source(s) of the interrupt in bitmask n; a transputer in panic can draw attention this way...; which transputer is connected to which EVT-connection can be found in chapter 'CSB Connections',
$\diamond$
EVT1.TIMEOUT tp.id=$<$n$>$
EVT2.TIMEOUT tp.id=$<$n$>$
EVT3.TIMEOUT tp.id=$<$n$>$
CSB timed out on an expected event (interrupt) from the EVT1-, EVT2- or EVT3-board respectively, from transputer with identifier n,
$\diamond$
LKC1.OINT.TIMEOUT mask=$<$n$>$ bytes.sent=$<$m$>$
LKC2.OINT.TIMEOUT mask=$<$n$>$ bytes.sent=$<$m$>$
one or more transputers connected to the LKC1- or LKC2-board respectively did not accept a byte sent via the LKC-board; bitmask n shows in the bits which are NOT 1 which connected transputer did not accept; the number of bytes of the message sent via LKC before the failure occurred is m.
$\diamond$
LKC.UNKNOWN.COMMAND cmd=$<$n$>$
CSB received an unknown message byte n via one of its LKCs,
$\diamond$
LKS1.CONFIG.FAIL
LKS2.CONFIG.FAIL
LKS3.CONFIG.FAIL
CSB failed to configure its LKS1-, LKS2- or LKS3-board because of a configuration link communication problem,
$\diamond$
LKS.REQUEST.DISABLED (Warning!)
CSB received a request for an LKS-link connection, but the permission to use this link is not yet given (by the HOST-transputer); this warning might occur in the network startup phase; messages are queued and should appear in the logfile as soon as the LKS-links are enabled later on in the startup procedure,
$\diamond$
READOUT.OK ackn failed
READOUT.NOT.OK ackn failed
communication problem with CSB-message receiver process and the main process (both processes run on the HOST-transputer),
$\diamond$
TRP.ERROR1
device connected to the TRP-ARI connector set its error flag,
$\diamond$
TRP.ERROR2
device connected to the TRP-ARO connector set its error flag,
$\diamond$
TRP.NONEMPTY.EVTREG after init: trp.requestin.reg=$<$n$>$ the event register of the TRP-board is not zero after initialization (as it should be),
$\diamond$
TRP.UNEXP.EVENT expected=$<$n$>$ received=$<$m$>$
the TRP-board received an unexpected interrupt or interrupts; n and m are interrupt bitmasks (see [5] for their definition),
$\diamond$
unknown CSB source
the CSBREPORT process received a message with an unknown sender identifier; unlikely that this will ever happen, but if it happens it is serious because it means there probably is a CSB hardware problem,
$\diamond$
unknown message tag $<$n$>$
the CSBREPORT process did not understand message identifier n it received from the CSB (it wasn't any of the identifiers described below).



Error messages originating from different processes on the READOUT transputers are also sent via the CSB (name of the message sender in this case also: CSBREPORT), to enable notification on the RunControl screen, are worth noting; these messages are accompanied by a more explanatory message in the logfile, written there directly by the READOUT transputer (through its monitor link).

$\diamond$
CAMAC.INIT.ERROR
initialization of the CAMAC hardware (for LASER in RCAL crate 9) failed (procedure is cccz, ccci, cccc, cclm (ADC, N=2), cclm (TDC, N=5)),
$\diamond$
CAMAC.STATUS.ERROR
error occurred while reading out data from CAMAC for LASER (one or more of: LAM timeout ADC (N=2), data read error ADC (N=2), LAM timeout TDC (N=5), data read error (N=5)),
$\diamond$
CALIB.XMIT.ERROR
an XOR-checksum error occurred in the downloading of calibration constants blocks from host to READOUT transputer (compare to DC.XOR.ERROR),
$\diamond$
CALIB.CNST.MISSING
calibration constants for one or more of the Digital Cards in the crate are missing from the download from host to READOUT transputer,
$\diamond$
DC.BRC.BOOT.FAIL
booting the Digital Cards in a crate by broadcast method failed,
$\diamond$
DC.DATA.MISMATCH
a mismatch was found between the DSP calculated time and energy sums and the transputer calculated sums (the check is performed on a regular basis during runs for CAL Digital Cards only),
$\diamond$
DC.DATA.TIMEOUT
a timeout occurred while waiting for an event to appear in the Digital Card DPM (although the GSLT-decision has been received already),
$\diamond$
DC.DOWNLOAD.FAIL
downloading of one or more blocks of calibration constants to one or more Digital Cards failed,
$\diamond$
DC.GLOBALEXEC.FAIL
giving the Digital Card exec command failed (during means&sigmas readout in calibration runs),
$\diamond$
DC.GLOBALREAD.FAIL
setting the Digital Card read flag failed (during means&sigmas readout in calibration runs),
$\diamond$
DC.HEADER.ERROR
a mismatch occurred between the headerword of the first Digital Card in the crate and another in this crate,
$\diamond$
DC.PAGENO.ERROR
the page number from a Digital Card page header does not match the page number read from the Digital Card OFDR,
$\diamond$
DC.PAGENO.ORDER
the page number read from the Digital Card OFDR does not match the expected number,
$\diamond$
DC.PARITY.ERROR
a parity error occurred on the Digital Card for this event (the least significant bit of the control byte (byte 3) of the header word is set),
$\diamond$
DC.XOR.ERROR
the Digital Card reports a mismatch between the XOR-checksum(s) of calibration constants blocks received and the one(s) calculated by the DSP,
$\diamond$
EVENT.TOO.BIG
an event was generated with a size larger than the available buffer size (this can only happen if the code is compiled in undefined mode),
$\diamond$
GSLT.BUFFER $<$description$>$
the GSLT buffering process detected a corrupted GSLT-decision; the nature of the corruption is explained in description,
$\diamond$
GSLT.DC.MISMATCH
a mismatch was found between the FLT-number or the triggertype provided by the Digital Card and the FLT-number or triggertype provided by the GSLT-decision,
$\diamond$
HWPARAMS.ERROR
an error was detected in the hardware parameters received from the host,
$\diamond$
POLY.CONST.ERROR
an error was detected in the format of the downloaded polynomial constants (needed for calibration runs),
$\diamond$
RO.DATASEND.FAIL
the sender process is trying now for about 30 seconds to send event data to the ROCOLLECT transputer,
$\diamond$
TWOTP.BERRL.EVT
the transputer detected VME-bus errors,
$\diamond$
TWOTP.TIMEOUT.EVT
the transputer detected refresh timeout errors (probably caused by not getting VME access).



At regular intervals during a run a STATUS.RUN command is given; when the CAL-SLT is taking part in the run and the 'status' hasn't changed in between two STATUS.RUN commands the status of the CAL-SLT is printed in the logfile.



For a READOUT transputer the status printed looks like this (example):

TP.ID #201D #000031F7 #000031EC #000031EC #00000000
            #00000546 #00000000 #00000091

in which #201D is the transputer identifier; the 7 numbers following are respectively:

  1. the FLT-trigger number processed by the read.trigger.data() process
  2. the FLT-trigger number of the last GSLT-decision received by the gslt.decision.buffer() process
  3. the FLT-trigger number of the last GSLT-decision processed by the read.cal.data() process
  4. the total number of polls necessary in this run while waiting for Digital Card data while the GSLT-decision was received already
  5. the GSLT-trigger number of events sent by the send.data() process to the ROCOLLECT transputer
  6. the total number of times in this run a timeout or transmission error occurred while sending from READOUT to ROCOLLECT transputer, when using the 'SECURE' send option in sender.opp
  7. the total number of times the send.data() process has had to wait for permission to send an event to the ROCOLLECT transputer in this run



For a LAYER1 trigger transputer the status printed looks like this (example):

TP.ID #101D #00000000 #000031F8 #00003206 #00000000
            #00000000 #00000000 #00000000

in which #101D is the transputer identifier; the following 7 numbers are respectively:

  1. not used
  2. the number of events processed by the LAYER1 algorithm
  3. the number of data blocks sent to LAYER2 (includes all events plus 13 'FORWARD.CONSTANTS' data blocks plus 1 'BECOME.ACTIVE' data block)
  4. not used
  5. not used
  6. the number of arithmetic errors/overflows that occurred in the LAYER1 algorithm process (in which part(s) of the algorithm an error occurred can be found per event in a word in the CAL-SLT offline databanks)
  7. not used



For a LAYER2 trigger transputer the status printed looks like this (example):

TP.ID #1214 #31F8 #0000 #0000 #3206 #3206 #0000 #3206 #0000
            #0000 #0000 #0000 #0000 #0000 #0000 #0000

in which #1214 is the transputer identifier; the following 15 numbers are respectively:

  1. the number of events processed by the LAYER2 algorithm
  2. the number of arithmetic errors/overflows that occurred in the LAYER2 algorithm process (in which part(s) of the algorithm an error occurred can be found per event in a word in the CAL-SLT offline databanks)
  3. the number of events received by the input process (on link 0); includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
  4. idem (on link 1)
  5. idem (on link 2)
  6. idem (on link 3)
  7. the number of events sent to LAYER3; includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
  8. not used
  9. not used
  10. the number of times a transmission error occurred in the reception of an event from LAYER1 (on link 0)
  11. idem (on link 1)
  12. idem (on link 2)
  13. idem (on link 3)
  14. not used
  15. not used

For a LAYER3 trigger transputer the status printed looks like this (example):

TP.ID #1400 #31F8 #0000 #3206 #3206 #3206 #0000 #31F8 #0000
            #0000 #0000 #0000 #0000 #0000 #0000 #0000

in which #1400 is the transputer identifier; the following 15 numbers are respectively:

  1. the number of events processed by the LAYER3 algorithm
  2. the number of arithmetic errors/overflows that occurred in the LAYER3 algorithm process (in which part(s) of the algorithm an error occurred can be found per event in a word in the CAL-SLT offline databanks)
  3. the number of events received by the input process (on link 1); includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
  4. idem (on link 2)
  5. idem (on link 3)
  6. not used
  7. the number of events sent to the GSLT
  8. not used
  9. not used
  10. the number of times a transmission error occurred in the reception of an event from LAYER2 (on link 1)
  11. idem (on link 2)
  12. idem (on link 3)
  13. not used
  14. not used
  15. not used


next up previous contents
Next: Standalone Test Runs Up: ZEUS CALDAQ Transputer System Previous: BOR Files   Contents
Henk Boterenbrood 2005-01-06