/* ------------------------------------------------------------------------- */ DATE : February 1999 PROBLEM DESCR: GSLT <> DC: DC FLT-no out of sync with GSLT SOLUTION: Stuck bit in the communication between CFLTP and NIM-electronics; after some moving of cables and reset of NIM-electronics the problem goes away. /* ------------------------------------------------------------------------- */ DATE : January 1999 PROBLEM DESCR: CAL-SLT error report: RCAL: ###ARE3.ERROR mask #0080 at a regular rate of about 1 per second for the duration of every ZEUS run. This lasted for about 4 days.... The GSLT investigated the type of error that occurred: bit 15 of the SLT-error was set, meaning: error in global summing at CAL-SLT Layer3. SOLUTION: From a mail from Nichol Brummer: " It turns out that the CAL laser has been in a wrong setting from friday till last night. After it was switched back to its proper setting, it is not anymore firing into random physics events, causing overflows in the CAL SLT energy summing." /* ------------------------------------------------------------------------- */ DATE : October 1998 PROBLEM DESCR: Data corruption in FCAL crate data (in data from all crates, even crates that do not exist...(crate 7,10,12)), reported offline(?). SOLUTION: Not caused by the CALDAQ system, but further down the chain: all corrupted data passed through TLT branch #2 => problable error in 'RADSTONE' EVB-TLT board. /* ------------------------------------------------------------------------- */ DATE : October 1998 PROBLEM DESCR: At startup all DCs report a 'DC.BOOT.FAIL'. SOLUTION: When switching off some crates to try to fix a problem with a flaky transputer link the power to the NEVIS fan-out to the DC-crates had tripped, but this was not noticed. Switching on the power solved the problem immediately. /* ------------------------------------------------------------------------- */ DATE : June 1998 PROBLEM DESCR: The system comes to a halt in the 'GAINS' run of a calibration; from the logfile can be seen that the readout-tp in RCAL crate 5 crashes. SOLUTION: The debugger shows that there is some integer overflow in the calcu- lations done at the end of the 'GAINS' run (calculating the 'cross gain spread'), for DC 13. The input to DC 13 was unplugged and the calibration runs went fine; the input was replugged and the calibration runs still ran fine... ==> bad connection at the connector ? /* ------------------------------------------------------------------------- */ DATE : Jan 1998 PROBLEM DESCR: FCAL kept hanging in calibration runs, more precisely: the new FPC crate -FCAL crate 5- was hanging while outputting data at the end of a calibration run; no obvious reports in the logfile that point to a hardware problem. SOLUTION: After long investigations (code/debugger) I determined the interrupts between the 2 transputers on the 2TP were not working; a test with the CSB testprogram -a bit late, but I didn't expect a hardware problem- showed a faulty TRIGGER tp; exchanging the 2TP did not help, exchanging the 64-wire cable did; it got damaged while being handled a lot (why, I don't know). /* ------------------------------------------------------------------------- */ DATE : Jan 1998 PROBLEM DESCR: Many boot problems in the FCAL crate when starting up; the CSB test showed many "LKC boot problems"; this was one of the first times of starting up after a winter stop, to test the addition of the FPC readout crate (FCAL crate 5). SOLUTION: Many module swaps with the RCAL-CSB showed problems in an ARE, and LKS and an LKC module.... all 3 very probably have a hardware failure.; a 2tp broke around the same time. Possible cause: 64-wire cable to CSB put in DigCard P2 connector (which has its own pin definitions) instead of normal P2 (only slot 1 and 2 of a Digital Card VME-crate !). April 1998, modules are at NIKHEF: 2TP: had a linkbuffer chip with cover blown off...; repaired by INCAA. LKS (ser.no.#5): had roasted buffer chip and a few C004 channels broken; board damaged; patched with wires; C004 linkswitch chip replaced. ARE (ser.no.#7): buffer chip (LS31) replaced. LKC (ser.no.#10): burnt-out buffer chip and 1 PCB track acted as fuse...; 3 connected C012 broken; chips replaced, PCB track fixed with wire. ### This was a very serious incident which cost a lot of time and effort to fix; care should be taken when moving 2tp and CSB modules about !! /* ------------------------------------------------------------------------- */ DATE : Sep 6 1997 PROBLEM DESCR: GSLT complaining "waiting for data from CAL", GSLT CAL buffer empty. However CALDAQ complains about "RCAL buffers to EVB full", and indeed, from iserver.log, can be seen that the ROCOLLECT buffer in TPM is full, but EVB for some reason does not empty it. Clearly can be seen that the RCAL ROCOLLECT is blocked, because of full buffers. Note that FCAL and BCAL are also not empty: >>>> FCAL ROCOLLECT STATUS<<< head.ptr, tail.ptr = 9093 71017 calec.tail = 9093 space.requirement = 2236 evb.events.written = 98370 data.len = 223,240,205,220 trigger.no = 98368,98368,98368,98368 frontend.no = 5,7,8,10 >>>> BCAL ROCOLLECT STATUS<<< head.ptr, tail.ptr = 99369 22982 calec.tail = 99369 space.requirement = 2529 evb.events.written = 98370 data.len = 180,168,199,183 trigger.no = 98368,98368,98368,98368 frontend.no = 0,4,8,10 >>>> RCAL ROCOLLECT STATUS<<< head.ptr, tail.ptr = 48058 50917 calec.tail = 48058 reserve.result = -1 space.requirement = 2949 evb.events.written = 98334 data.len = 168,175,278,458 trigger.no = 98340,98340,98340,98340 frontend.no = 3,2,10,8 event.no(collector)= 98335,98336,98337,98338,98339,98333,98334 event.complete = #00FFF,#00FFF,#00FFF,#00FFF,#00FFF,#00FFF,#00FFF SOLUTION: Unknown. Ulf Behrens has been asked about a possible problem with the EVB: he says he knows about this effect, doesn't know the reason, and we have to wait for it to disappear.... /* ------------------------------------------------------------------------- */ DATE : Aug-Sep 1997 PROBLEM DESCR: No data, no triggers; sometimes a few, then stops. SOLUTION: In iserver.log saw sometimes in the 'RCAL ROCOLLECT STATUS' event.no(collector)= 0, -1, -9, -9, -9, -9, -9, -9 event.complete = #005FF,#005FF,#00000,#00000,#00000,#00000,#00000 although the boot/readout mask for RCAL was #0DFF, meaning that the crate with maskbit #0800, RCAL crate #12 was not sending its data. Just that afternoon the delay-box (for components FNC and PRT in this crate) was exchanged; it was exchanged back for the old-one and things ran again (first it was tried by excluding the crate from the boot/readout mask). /* ------------------------------------------------------------------------- */ DATE : Aug 30 1997 PROBLEM DESCR: GFLT complains about fatal error from component CAL. SOLUTION: This had nothing to do with the CALDAQ transputer network, but was a problem in the connection CFLT-->GFLT; solved by 'restarting CFLT'. /* ------------------------------------------------------------------------- */ DATE : Aug 1997 PROBLEM DESCR: Errors of the following kind reported in the logfile, with sometimes hangs of the system, most certainly when the 'UNKNOWN ERROR' occurs: ###LAYER2/3 tp #1213 link 2 SYNC.WORD=#AABB8CDD: BITFLIP ERROR ###LAYER2/3 tp #1213 link 2 SYNC.WORD=#00000000: UNKNOWN ERROR SOLUTION: This is a link communication problem with link 2 of the FCAL LAYER2 #3 transputer (which comes -via the CSB- from FCAL LAYER1 tp in FCAL crate 1 (see CALDAQ network drawing)). Actions: 1. switch off/on the crates involved; see if that helps... 2. unplug/plug the cable from the CSB to the LAYER2 tp, and unplugging/plugging the LAYER2 2tp-module and the READOUT/TRIGGER 2tp-module. Only 2. was tried and didn't seem to help, but soon after (after a general power failure?) the problem was gone, at least for the moment... /* ------------------------------------------------------------------------- */