Present : K. Korcyl, I. Mandjavidze, S. Wheeler, R. Scholte, R. Slopsema, R. Bock, R. Cranfield, P. Le Du, S. Gonzalez, S. Tapprogge, G. Lehmann, J. R. Hansen, M. Huet, J. Bystricky, R. Blair, J. Vermeulen
In order to simplify minutes R. Blair requested that speakers provide him with a brief summary of their talks and that the talks themselves be made available in electronic form for posting on the Modeling mater working document web page. Mostly these minutes consist of the notes provided with a few things added where it seemed necessary.
1. ATLAS specific parameters (e.g. number of ROBIns, ROB mapping, etc.) : status of availability, open questions
>>> J. Vermeulen : overview (.pdf or PowerPoint file (better on-screen display) )
Trigger menus have been taken from COM-DAQ-99-010, processing sequences are the same as used earlier, reduction factors have been provided by T.Hansl. Total RoI rates per subdetector, derived from the trigger menus, processing sequences and reduction factors, were shown. The average number of processing steps was found to be 1.75 for the low luminosity trigger and 1.37 for the high luminosity trigger. Possible LVL2 RoI positions and sizes were shown, as well as a compact textual representation of the mapping of the ROBs of the different subdetectors on the detector. The RoI request probability for each ROB and for each RoI type were calculated with a new version of the "ROBsPerRoI" program. In the new version the mapping of the ROBs no longer is hardcoded, but it can read in the textual representation of the mapping referred to above.
Points addressed in the discussion were :
It had been assumed that it does not make sense to generate individual RoI requests for the pixel and SCT subdetectors on the basis of the results of the TRT scan, in view of the large amount of RoI requests that would be generated and the relatively small number of ROBs for these subdetectors. Therefore the TRT scan is assumed to cause requesting all data from the pixel and SCT subdetectors. No objections were raised.
P. Le Du emphasized that the question has to be asked whether the B-physics trigger in its present form (i.e. the TRT scan) is necessary : it has much higher requirements than the high luminosity triggers and it is not clear whether the physics potential makes it worthwhile to construct a B-physics trigger
There is probably no need for running a second level missing energy trigger and hence it should be considered as optional.
The size for the jet RoIs had been chosen to be 0.08 x 0.08 in eta-phi space, 1.0 x 1.0 is to be preferred.
The event fragment data sizes used for the calorimeters (2000 bytes) are too large. P. Le Du takes care of updating these numbers (new values now : 500 bytes for em calorimeter, 250 bytes for hadron calorimeter). For the calorimeter a small number of ROBs have considerably higher RoI request hit rates than the other ROBs. This is due to mapping of the full phi range (for a small eta interval) onto these ROBs. A small increase of the number of ROBs seems to be possible, an increase from 738 to 754 was mentioned. P. Le Du will provide updated information.
S. Wheeler has been studying the mapping of the pixels and SCT. Updated information is now (June 26) available. It was unclear how the average event fragment sizes were determined, and whether these can be assumed to be the same for high as well as low luminosity. Further investigation is necessary. Ultimately event fragment size distributions will be needed for the computer model.
R. Bock emphasized his interest in using the information on the RoI request hit probabilies to find groupings of ROBs that match well with the architecture of commercially available multi-PCI bus systems.
2. Pilot project model : paper and computer models : status, results
>>> P. le Du / J. Bystricky : paper model
J. Bystricky presented new results obtained with the spreadsheet used
by A. Amadon before, that was derived from the spreadsheet of J. Vermeulen.
The mapping of the em calorimeter was as documented by P. Le Du, for some
of the other subdetectors the mapping of DAQ-note 70 is used. Trigger menus
are similar to those documented in COM-DAQ-99-010. A LVL1 accept rate of
75 kHz has been used, EF traffic has been included. Two different studies
have been done on (i) the consequences of different choices made for the
calorimeter mapping, and (ii) the consequences of the TRT full scan trigger.
It has been found that the layer mapping of the calorimeter is acceptable.
Sequential processing of the data of the different layers of the calorimeter
also can help in reducing the RoI request rate per ROB, results of S. Gonzalez
are used for studying this issue. The TRT full scan trigger algorithm,
if not executed by dedicated hardware, is found for the low luminosity
trigger to require a processor farm that is an order of magnitude larger
(about 650 processors under the assumptions made) than for the high luminosity
trigger (60 - 70 processors). If this trigger is taken care of by dedicated
hardware the size of the processor farm required for the low luminosity
trigger would be about the same as the size required for the high luminosity
trigger. It was remarked that the scan will produce more than 20 RoI requests
for the SCT and pizels, 50 - 60 seems to be a reasonable estimate. P. Le
Du emphasized that the estimated minimum number of processors as determined
from the processing requirements for the high luminosity trigger is now
much lower than expected before (~1000 => ~100), but that for the low luminosity
trigger it is much higher !
Next steps planned to be taken consist of adding a second sequential
step in the treatment of em calorimeter data, making a model of the 32
node ATM testbed and studying the consequences of sequential simple higher
level trigger menus (i.e. menus for a combination of LVL2 trigger and EF)
for the size of a higher level trigger system and for choices to be made
with respect to ROB mapping and grouping.
>>> J. Vermeulen : paper model (slides 23 - 51 in .pdf or PowerPoint file (same file as for 1.) )
Results obtained with a new version (3.0) of the Excel spreadsheet were presented. In the new version support for the pixel detector and analysis of jets in tracker has been added. Tables for 1 - 128 ROBins per ROBOut (as generated with the "RoBsPerRoI program") are included. Also lower and upper limits of RoI request rates are calculated (assuming eta-phi indepence of the probability of finding a RoI request at certain eta and phi). A model of the DAQ-1 Read Out Crate with VME and PVIC busses is included as well. A number of graphs has been added, the graphs presented were taken from the spreadsheet. Total average, lower limit of and upper limit of RoI request rate, volume of data to be sent per second to the LVL2 system, and CPU utilization per ROBIn were shown for both low and high luminosity menus. The parameters for the ROBIns were as discussed in the January meeting. Also the average total data volume per subdetector to be sent to the LVL2 system and the associated message rate were shown. The number of ROBIns per ROBOut was set to 4 for the muon detectors and for the em calorimeter and to 2 for the hadron calorimeter. For the minimum number of processors ("1000 MIPs" processors) in the LVL2 processor farm determined from the processing requirements, using the parameters as discussed in the January meeting, a number of 654 was found for the low luminosity trigger and of 71 for the high luminosity trigger. The input bandwidth required per processor was 9.6 MByte/s for the low luminosity trigger and 39.6 MByte/s for the high luminosity trigger, i.e. considerably more than the 15 MByte/s assumed to be available per link.
In the discussion it was mentioned that sequential processing of em / hadron RoIs of the em and hadrin calorimeter data may reduce the RoI request rate for the hadron calorimeter with a factor of 6. (P. Le Du has a few days after the meeting provided the following information : processing the data from the middle layer of the em calorimeter leads to a reduction factor of 1.7 - 2, processing of the data of the front layer leads to another reduction factor of 3.3).
>>> J. Vermeulen : computer model (slides start with slide 53 in same file as slides for 1.- .pdf or PowerPoint file)
The generic model of the full pilot project system is now implemented
in simdaq and has been debugged, together with the paper model. Average
rates, data volumes and processor occupancies computed with both models
compare closely, provided that the LVL1 accept rate is chosen low enough
to prevent building up of the queues in the computer model. An overview
of maximum queue sizes, minimum and maximum occupancies, etc. . has been
added to the output of the program. The results obtained so far show that
the LVL1 accept rate for the current model can be at maximum 30 kHz for
low as well as high luminosity. The number of processors in the LVL2 farm
was chosen to be 768, the number of supervisor processors was chosen to
be 5. The nominal LVL1 accept rate for the trigger menus used is about
40 kHz. The bandwidth of the links in the system of 15 MByte/s is limiting
this to 30 kHz. For low luminosity this is due to the data to be transferred
from the SCT ROBs to the LVL2 farm, for high luminosity the data to be
transferred from the hadron calorimeter ROBOuts to the LVL2 farm may be
causing a problem (note : an event fragment size of 2000 bytes has been
used, which is too large, 250 bytes seems to be a more reasonable estimate).
A number of decision time distributions was shown for both low and high
luminosity and for different LVL1 accept rates. All these distributions
show peaks and a fine substructure. Processing with fixed processing times
of the TRT scan and associated SCT and pixel scan and of the missing energy
triggers probably can explain the large peaks, while the substructure may
be attributable to the round-robin assignment of the processors in the
farm. Further investigation is necessary to confirm this explanation of
the behaviour observed.
3. High-level models of test set-ups and relevant parameters
>>> R. Blair : modelling the ATM test setup
As a starting point for comparisons between actual test setups and SIMDAQ results of a set of runs done at Argonne on a small ATM switch (16 port) and a group of ATM equipped PC's were used. The excercise was mostly useful to indicate some of the details that need to be addressed to make meaningful comparisons. In the tests there were 1-8 Pentium II's (300Mhz) acting as Global Processors and 1-2 PowerPC's (330Mhz) acting as supervisors. There was a fixed algorithm time and a fixed queue depth per Global PC. No actual requests for data were sent to the ROB's in this configuration.
This exact configuration could not be modeled in SIMDAQ, but something
close could. The detailed differences included:
- SIMDAQ had to make a data request of the ROB's, but the network transit
and transaction times were fixed at zero
- SIMDAQ had a different model of the supervisor which included a single
system receiving Global responses and multiple systems sending requests,
the transaction times were fudged for the single receiver to try to compensate
- SIMDAQ was only able to set a queue depth that corresponded to all
Global systems rather than each Global CPU, to compensate the overall SIMDAQ
queue depth was set at the product of the number of Global CPUs times the
actual per processor depth
None of these should be difficult to modify in a future version of SIMDAQ. J. Vermeulen pointed out that the most difficult one was in fact already dealt with (the supervisor structure) and could be made available.
In order to get stable results the scheduling time had to be set to 5 microseconds. This was tuned to make the run with 8 global systems and one supervisor agree in rate between SIMDAQ and the measurements. With the exception of the run with 8 global processors and 2 supervisors the rates were in agreement for various runs at the 20% level. The 8/2 run was measured to run at 86 kHz and SIMDAQ predicted 56 kHz. The latency distributions for the 8/1 runs with two choices of algorithm time show a much narrower distribution from SIMDAQ than the measured distribution.
More work needs to go into making the details agree in SIMDAQ, but the initial comparisons do not look too discouraging.
>>> S. Wheeler : LVL2 trigger simulation framework using Ptolemy
A status report (for .pdf file click here) on the ongoing work to simulate the ethernet testbed with Ptolemy was given. The system has been factorised into a number of nodes e.g. Supervisor, with well-defined boundaries.The boundaries are defined by messages the node can send and receive. High-level specification documents have been written for the nodes and the message library. The message library has been written in C++. The nodes have been implemented as "stars" in Ptolemy. A simple "proof-of-principle" simulation configuration has been built up using these stars and it runs. The switch node in the configuration is still a "null switch". The correct implentation will be available later in June. More work is also underway to split the processor into FEX and steering components and will be ready in June. This simulation already represents a large fraction of the work required to model the testbed. It was noted that although a simple ROB emulator should be sufficient for the testbed modelling, in the current simulation ROBs are represented as ROB complexes. This was done in order to help with PCI bus measurements in the UK. W.Li of RHBNC and G.Crone of UCL were acknowledged for the large amount of effort they have put into the PCI bus model.
The project is fully documented on the web at http://www.hep.ucl.ac.uk/atlas/simulation/.
The overview document on the web page is available for inclusion in the MWD. The web page also contains schematics which can be downloaded to allow anyone with a working version of Ptolemy to run the simulation. In conclusion, the goal of the prototype configuration has been achieved. Further work involves, detailed understanding of the reference software to refine nodes and a big effort on parallel work with the Ethernet Testbed to calibrate nodes on a simple configuration. The model will then be used to predict behaviour on a large system and comparisons will be made, hopefully, allowing us to gain faith in the ability to scale to even larger systems.
4. Models of technologies : switches, ROBComplex, etc. .
>>> K.
Korcyl : Modeling Ethernet switches
Fast and Gigabit Ethernet are candidates for the ATLAS trigger network.
Due to the network's size it has to be constructed as a layered structure
of smaller units. To assess the scalability of such a structure we evaluated
a single switch unit. We have chosen a commodity Fast Ethernet switch,
modeled its behaviour in detail, and simulated it.
The following steps were taken :
1) We took a commodity switch and calibrate by making a set of measurements (Slide 2).
We measure the time needed to send a message from PC1 to PC2. The time is a function of the message length. We take the measurements up to 1500Bytes (The maximum user data length for an Ethernet frame). The results are shown on Slide 3.
4) Measurements made on the detailed model were compared with that made on the real switch (Slide 5). -Good agreement was obtained up to the saturation point.
5) A simplified (Slide 6) model which still retained the characteristic behaviour of the detailed model was constructed for the following reasons:
8) To measure the limits P2 and P3 we used:
The above conditions allowed us to avoid queueing which could interfere with the measurements.
10) The parameterised model has been built and verified (Slide 9). Slide 9 shows the results of measurements and model for 1500Bytes with different loads on the switch (The percentage load is calculated with respect to the measured limits). For the measurements we were using random traffic (random means each node could sent to many) and inter-packet time taken from an exponential distribution.
Conclusions
Parameterised model reflects the behaviour of the real switch with a few percent accuracy. Thanks to it's simplicity, larger networks can be modelled without dramatic increase in the modelling time as was observed using the detailed model. For the future switches (Fully non-blocking), the parameters can be set to values such that only output queueing will be dominant.
>>> J. Vermeulen : ROBIn model
Simple equations for computing the maximum LVL1 accept rate that the CRUSH (SHARC based ROBIn design, developed at NIKHEF) can handle were shown. Three limiting factors can be distinguished : (i) the bandwidth of the Read-Out Link (link between ROD and ROBIn), (ii) the CPU resources available and (iii) the bandwidth of the SHARC link used for transporting the data to the output part of the ROBComplex. Depending on the fragment size and the RoI request and LVL2 accept rates one of these three limiting factors applies. For most cases calculated and measured maximum LVL1 accept rates differ less than 5 % (see results presented at the June 3 ROBComplex meeting).
5. Modelling of DAQ components (ROC, EB, EF)
>>> G. Lehmann : Modeling of the DAQ-1 Event Builder
A small event builder including one or two source and one or two destination nodes was modeled with PTOLEMY. For the model of the network interfaces and the switch a parameterized model of the ATM/AAL5 technology was applied. Results were shown for 1X1 and 2X2 measurements and with a range of ROC fragment sizes (100 bytes to 16 kbytes). The simulation and measurements (DAQ-1 EB) agreed to a few percent. Further work includes a study of the scalability and of the functionality and robustness of the Event Builder protocol.
>>> J. Vermeulen : Paper model ROC (slides 46 - 50 in same file as slides for 1. .pdf or PowerPoint file)
The model of the DAQ-1 Read Out Crate (ROC), as available in version 3.00 of the Excel spreadsheet was presented and results were shown. In the model the utilization of CPUs and VME and PVIC busses in the crate due to data transfer and associated protocol handling is computed. Since the meeting M. Joos and J. Petersen have provided input on which the model that is found in version 3.00 of the spreadsheet is based.
6. Master Working Document (MWD) and workplan, including a discussion of modelling for the TP and what should be done for it and when it must be done.
J. Vermeulen reported that a second draft of the MWD is in preparation. Pointers to relevant measurement results should be included. I. Madjavidze has provided input with respect to ATM technology, K. Korcyl will do the same for Ethernet technology. It was remarked that the reference software may produce estimates of processing times.
The pilot project modelling workplan consists of 5 tasks, all but the first one in the original planning to be finished before October 1999 :
1. Review of parameters and update of paper model, original external milestone : spreadsheet should be available in December 1998
Although delayed, it was agreed that now the first round of parameter
collection is nearing an end and that there are several areas that needed
to begin next.
The spreadsheet results are well advanced and having two groups pursuing
this (Saclay and NIKHEF) is good. The spreadsheet results highlight
the need to better come to grips with the big difference in hardware requirements
between low and high luminosity. This should be addressed at the
collaboration level and resolved as soon as possible. A number of the parameters
have to be updated (see the report on the discussion concerning agenda
point 1).
2. Implementation and study of behaviour of the full but generic system
model.
With the implementation now available and the detailed study of the behaviour of the model started execution of this task is more or less proceeding as planned.
3. Implementation and study of behaviour of generic models (i.e. without detailed models of different technologies used) of test systems : original planning : until October 1999.
The work on modelling the ATM testbed (see report R. Blair) falls within this category. This task also serves for providing evidence for the credibility of results obtained for the full system with SIMDAQ.
4. Implementation and study of behaviour of full models (i.e. with detailed models of different technologies used) of test systems
Progress is made here, see the reports by S. Wheeler and K. Korcyl. Parameter extraction for specific technologies is also producing results
5. Refinement and study of behaviour of full models, with detailed models of different technologies, of the LVL2 system, as needed
This is not yet addressed.
Since the TP is not intended to specify a technology it was considered that completion of the first four tasks before the December trigger/DAQ workshop would adequately support the TP and provide a starting point for the next round of technology specific decisions. The next modelling meeting is planned for September (just before the ATLAS week).