Next: Standalone Test Runs
Up: ZEUS CALDAQ Transputer System
Previous: BOR Files
  Contents
The CALDAQ transputer network logfile (named iserver.log, which can be
found in directory
calec_rc/log) contains mostly
messages of the format:
sender-name
:
message
in which sender-name is the name of the process, routine or
transputer sending the message.
A complete description of the messages present in the logfile is not
available...; why and where the messages are generated can be found in the
source code and is mostly a matter of concern to the code expert.
However a few message types will be explained here in more detail.
Messages starting with '###' signify that something serious or
possibly fatal occurred.
Error messages originating from the CSBs (name of the message sender
in this case: CSBREPORT) are worth noting;
some of these messages are accompanied by a more explanatory message on
the RunControl screen. Here is a list of these messages:

- ARE1.ERROR mask=
n
ARE2.ERROR mask=
n
ARE3.ERROR mask=
n
CSB received an interrupt from ARE1-, ARE2- or ARE3-board
respectively, with the source(s) of the interrupt in bitmask n,
meaning that the connected transputer(s) set the error flag
(but did not necessarily halt, if the code was compiled in undefined mode,
e.g. the CAL-SLT code);
which transputer is connected to which ARE-connection can be found
in chapter 'CSB Connections',

- could not send NEXT TRIGGER
serious problem with process that controls NEVIS frontend control electronics
(on HOST-transputer, via Serial Cards);
it does not accept the command to generate the next trigger
(in standalone runs),

- CSB.MESS.FAIL dest=
n
the CSB failed to pass on a message to the next CSB
(destination id = n),

- CSB.UNKNOWN.COMMAND cmd=
n
the CSB received an unknown command n from HOST,

- CSB.UNKNOWN.DESTINATION dest=
n
the CSB received an unknown destination transputer number n from HOST,

- EVT1.EVENT mask=
n
EVT2.EVENT mask=
n
EVT3.EVENT mask=
n
CSB received an (unexpected) interrupt from EVT1-, EVT2- or EVT3-board
respectively, with the source(s) of the interrupt in bitmask n;
a transputer in panic can draw attention this way...;
which transputer is connected to which EVT-connection can be found
in chapter 'CSB Connections',

- EVT1.TIMEOUT tp.id=
n
EVT2.TIMEOUT tp.id=
n
EVT3.TIMEOUT tp.id=
n
CSB timed out on an expected event (interrupt) from the EVT1-, EVT2-
or EVT3-board respectively, from transputer with identifier n,

- LKC1.OINT.TIMEOUT mask=
n
bytes.sent=
m
LKC2.OINT.TIMEOUT mask=
n
bytes.sent=
m
one or more transputers connected to the LKC1- or LKC2-board respectively
did not accept a byte sent via the LKC-board; bitmask n shows in
the bits which are NOT 1 which connected transputer did not accept;
the number of bytes of the message sent via LKC before the failure
occurred is m.

- LKC.UNKNOWN.COMMAND cmd=
n
CSB received an unknown message byte n via one of its LKCs,

- LKS1.CONFIG.FAIL
LKS2.CONFIG.FAIL
LKS3.CONFIG.FAIL
CSB failed to configure its LKS1-, LKS2- or LKS3-board because
of a configuration link communication problem,

- LKS.REQUEST.DISABLED (Warning!)
CSB received a request for an LKS-link connection, but the permission to
use this link is not yet given (by the HOST-transputer); this warning might
occur in the network startup phase; messages are queued and should appear
in the logfile as soon as the LKS-links are enabled later on in the startup
procedure,

- READOUT.OK ackn failed
READOUT.NOT.OK ackn failed
communication problem with CSB-message receiver process and the main process
(both processes run on the HOST-transputer),

- TRP.ERROR1
device connected to the TRP-ARI connector set its error flag,

- TRP.ERROR2
device connected to the TRP-ARO connector set its error flag,

- TRP.NONEMPTY.EVTREG after init: trp.requestin.reg=
n
the event register of the TRP-board is not zero after initialization
(as it should be),

- TRP.UNEXP.EVENT expected=
n
received=
m
the TRP-board received an unexpected interrupt or interrupts;
n and m are interrupt bitmasks (see [5] for their
definition),

- unknown CSB source
the CSBREPORT process received a message with an unknown sender identifier;
unlikely that this will ever happen, but if it happens it is serious because
it means there probably is a CSB hardware problem,

- unknown message tag
n
the CSBREPORT process did not understand message identifier n
it received from the CSB (it wasn't any of the identifiers described below).
Error messages originating from different processes on the READOUT
transputers are also sent via the CSB (name of the message sender in
this case also: CSBREPORT), to enable notification on the
RunControl screen, are worth noting;
these messages are accompanied by a more explanatory message in
the logfile, written there directly by the READOUT transputer (through its
monitor link).

- CAMAC.INIT.ERROR
initialization of the CAMAC hardware (for LASER in RCAL crate 9) failed
(procedure is cccz, ccci, cccc, cclm (ADC, N=2), cclm (TDC, N=5)),

- CAMAC.STATUS.ERROR
error occurred while reading out data from CAMAC for LASER (one or more of:
LAM timeout ADC (N=2), data read error ADC (N=2), LAM timeout TDC (N=5),
data read error (N=5)),

- CALIB.XMIT.ERROR
an XOR-checksum error occurred in the downloading of calibration constants
blocks from host to READOUT transputer (compare to DC.XOR.ERROR),

- CALIB.CNST.MISSING
calibration constants for one or more of the Digital Cards in the crate
are missing from the download from host to READOUT transputer,

- DC.BRC.BOOT.FAIL
booting the Digital Cards in a crate by broadcast method failed,

- DC.DATA.MISMATCH
a mismatch was found between the DSP calculated time and energy sums
and the transputer calculated sums (the check is performed on a regular
basis during runs for CAL Digital Cards only),

- DC.DATA.TIMEOUT
a timeout occurred while waiting for an event to appear in the Digital
Card DPM (although the GSLT-decision has been received already),

- DC.DOWNLOAD.FAIL
downloading of one or more blocks of calibration constants to one or more
Digital Cards failed,

- DC.GLOBALEXEC.FAIL
giving the Digital Card exec command failed (during means&sigmas readout
in calibration runs),

- DC.GLOBALREAD.FAIL
setting the Digital Card read flag failed (during means&sigmas readout
in calibration runs),

- DC.HEADER.ERROR
a mismatch occurred between the headerword of the first Digital Card in the
crate and another in this crate,

- DC.PAGENO.ERROR
the page number from a Digital Card page header does not match the
page number read from the Digital Card OFDR,

- DC.PAGENO.ORDER
the page number read from the Digital Card OFDR does not match the expected
number,

- DC.PARITY.ERROR
a parity error occurred on the Digital Card for this event
(the least significant bit of the control byte (byte 3) of the header
word is set),

- DC.XOR.ERROR
the Digital Card reports a mismatch between the XOR-checksum(s) of calibration
constants blocks received and the one(s) calculated by the DSP,

- EVENT.TOO.BIG
an event was generated with a size larger than the available buffer size
(this can only happen if the code is compiled in undefined mode),

- GSLT.BUFFER
description
the GSLT buffering process detected a corrupted GSLT-decision; the nature
of the corruption is explained in description,

- GSLT.DC.MISMATCH
a mismatch was found between the FLT-number or the triggertype provided by
the Digital Card and the FLT-number or triggertype provided by the
GSLT-decision,

- HWPARAMS.ERROR
an error was detected in the hardware parameters received from the host,

- POLY.CONST.ERROR
an error was detected in the format of the downloaded polynomial constants
(needed for calibration runs),

- RO.DATASEND.FAIL
the sender process is trying now for about 30 seconds to send event data to
the ROCOLLECT transputer,

- TWOTP.BERRL.EVT
the transputer detected VME-bus errors,

- TWOTP.TIMEOUT.EVT
the transputer detected refresh timeout errors (probably caused by
not getting VME access).
At regular intervals during a run a STATUS.RUN command is given;
when the CAL-SLT is taking part in the run and the 'status' hasn't
changed in between two STATUS.RUN commands the status of the CAL-SLT is
printed in the logfile.
For a READOUT transputer the status printed looks like this
(example):
TP.ID #201D #000031F7 #000031EC #000031EC #00000000
#00000546 #00000000 #00000091
in which #201D is the transputer identifier; the 7 numbers
following are respectively:
- the FLT-trigger number processed by the read.trigger.data()
process
- the FLT-trigger number of the last GSLT-decision received by the
gslt.decision.buffer() process
- the FLT-trigger number of the last GSLT-decision processed by the
read.cal.data() process
- the total number of polls necessary in this run while waiting for
Digital Card data
while the GSLT-decision was received already
- the GSLT-trigger number of events sent by the send.data() process
to the ROCOLLECT transputer
- the total number of times in this run a timeout or transmission error
occurred while sending from READOUT to ROCOLLECT transputer, when
using the 'SECURE' send option in sender.opp
- the total number of times the send.data() process has had to wait
for permission to send an event to the ROCOLLECT transputer in this run
For a LAYER1 trigger transputer the status printed looks like this
(example):
TP.ID #101D #00000000 #000031F8 #00003206 #00000000
#00000000 #00000000 #00000000
in which #101D is the transputer identifier; the following 7 numbers
are respectively:
- not used
- the number of events processed by the LAYER1 algorithm
- the number of data blocks sent to LAYER2 (includes all events plus 13
'FORWARD.CONSTANTS' data blocks plus 1 'BECOME.ACTIVE' data block)
- not used
- not used
- the number of arithmetic errors/overflows that occurred in the
LAYER1 algorithm process (in which part(s) of the algorithm an error occurred
can be found per event in a word in the CAL-SLT offline databanks)
- not used
For a LAYER2 trigger transputer the status printed looks
like this (example):
TP.ID #1214 #31F8 #0000 #0000 #3206 #3206 #0000 #3206 #0000
#0000 #0000 #0000 #0000 #0000 #0000 #0000
in which #1214 is the transputer identifier; the following
15 numbers are respectively:
- the number of events processed by the LAYER2 algorithm
- the number of arithmetic errors/overflows that occurred in the
LAYER2 algorithm process (in which part(s) of the algorithm an error occurred
can be found per event in a word in the CAL-SLT offline databanks)
- the number of events received by the input process (on link 0);
includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
- idem (on link 1)
- idem (on link 2)
- idem (on link 3)
- the number of events sent to LAYER3;
includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
- not used
- not used
- the number of times a transmission error occurred in the reception of
an event from LAYER1 (on link 0)
- idem (on link 1)
- idem (on link 2)
- idem (on link 3)
- not used
- not used
For a LAYER3 trigger transputer the status printed looks
like this (example):
TP.ID #1400 #31F8 #0000 #3206 #3206 #3206 #0000 #31F8 #0000
#0000 #0000 #0000 #0000 #0000 #0000 #0000
in which #1400 is the transputer identifier; the following
15 numbers are respectively:
- the number of events processed by the LAYER3 algorithm
- the number of arithmetic errors/overflows that occurred in the
LAYER3 algorithm process (in which part(s) of the algorithm an error occurred
can be found per event in a word in the CAL-SLT offline databanks)
- the number of events received by the input process (on link 1);
includes all events plus 13 'FORWARD.CONSTANTS' plus 1 'BECOME.ACTIVE'
- idem (on link 2)
- idem (on link 3)
- not used
- the number of events sent to the GSLT
- not used
- not used
- the number of times a transmission error occurred in the reception of
an event from LAYER2 (on link 1)
- idem (on link 2)
- idem (on link 3)
- not used
- not used
- not used
Next: Standalone Test Runs
Up: ZEUS CALDAQ Transputer System
Previous: BOR Files
  Contents
Henk Boterenbrood
2005-01-06