S-LINK performance measurements in the PC environment
NIKHEF, September 1998
Introduction:
At NIKHEF Windows NT drivers for AMCC based PCI
S-link interfaces have been written. Using these drivers the performance
of an S-LINK connection using a ~2 m long SCSI cable was determined. We
have also obtained from S. Luitz a memory-mapping driver for Linux OS
for the same hardware, together with examples of the driving applications
[1]. We have used the software to measure the performance of PC's under
Linux running the source application. On this page the results of measurements
with two types of PC's with PentiumPro ( 200 MHz) and Pentium II ( 300
MHz) processors are presented.
Timing of the S-link drivers:

Fig.1 Source driver for Windows NT
In Fig.1 the results for the directly mapped DMA based and interrupt driven
source driver on PentiumPro (200 MHz) PC are shown. The total elapsed time
is given in sec, the rate in 1000 messages/sec and the bandwidth in MBytes/sec,
as a function of the message size (in Bytes).

Fig.2 Differences in the modelling of the NT driver
In Fig.2 the influence of different design strategies on the performance
of source driver are shown. The line marked with DMA represents the standard
model of a directly mapped interrupt driven device driver. The poll driver
does not employ interrupts, but actively polls on the DMA done. In the
poll-fixed model the poll driver was modified to map the user's buffer
when first used after which the buffer is reused. Results for the DMA based
interrupt driven destination driver are also shown. The measurements were
done with a source sending messages with the same length as received. The
througput was limited by the source. We will see later that with the destination
driver a larger throughput is obtained than with the source driver for
the same message length.

Fig.3 Bandwidth comparisons
In Fig.3 the bandwidth as a function of the message length for the DELL
PentiumPro (200 MHZ) and PentiumII (300 MHz) machines is shown. The line
labelled with "Linux" represents results for the source application running
under Linux on the PentiumPro machine. Under Linux the application communicates
directly with the S-link interface, which is memory mapped by the device
driver. The line labelled with "Lin-300" represents the same for the PentiumII
machine. NT results for the source driver for the Pentium II machine are
labelled with "NT-src300", those for the Pentium Pro machine with "NT-src",
and results for the DMA based interrupt driven destination driver for NT
on Pentium Pro with "NT-dest". For the measurements with the NT destination
driver the Linux source application on another Pentium Pro PC was used.
The results clearly show that with the NT destination driver a higher throughput
can be obtained than with the NT source driver for the range of message
sizes studied (up to a 20 kBytes). We can also see that the asymptotic
values for long messages are more dependent on the machine used than on
the driving method.
Analysing bus actions:
We have connected our PCI bus analyser of VMETRO to look for a clue explaining
the results. Typical waveform results are shown in Fig. 4. The bus analyser
was sampling every 30 ns, one dot of the upper trace represents one sampling
point .

Fig.4 Waveform results obtained with the PCI bus analyser
Roughly speaking the PCI bus is busy when both signals IRDY# and TRDY#
are low. As can be seen from fig. 4 this is the case during about a third
of the time during transfer of a block of data, which accounts for the
poor performance of about 1/3 of the maximum 133 MByte/s observed. We have
had the suspicion that the host bridge of the PC can be partially responsible
for the results obtained, as the PC is optimized for a short burst devices.
However, changing the latency of the host bridge did not improve the performance
appreciably.
The following trace shows how during short bursts every 30 ns a word
is read from the PCI bus (indicated in blue), while in between bursts there
are relatively long periods of inactivity (red lines). In this trace polling
on the status of the data transfer by the CPU is occurring quite frequently,
for the measurement results presented above the polling was considerably
less frequent. The second column in the trace contains the time intervals
between successive PCI cycles.
TRIG 0ns ......
AD32 MemWri FFBDFC2C 002D3000 Ok --
1 30ns ......
AD32 MemWri FFBDFC2C 00002710 TRetry --
2 120ns
30ns AD32 MemWri FFBDFC30 00002710 Ok --
3 90ns
30ns AD32 MemWri FFBDFC3C 0000D000 Ok --
4 120ns
60ns AD32 MemRd FFBDFC3C 0000D0A6 Ok
--
5 390ns 330ns AD32 MRdMul
002D3000 00000000 Ok -- 1st word transferred
This was starting procedure of DMA.....
6
30ns ...... AD32 MRdMul 002D3000 00000001 Ok
--
7
30ns ...... AD32 MRdMul 002D3000 00000002 Ok
--
8
30ns ...... AD32 MRdMul 002D3000 00000003 Ok
--
9
30ns ...... AD32 MRdMul 002D3000 00000004 Ok
--
10 30ns
...... AD32 MRdMul 002D3000 00000005 Ok --
11 30ns
...... AD32 MRdMul 002D3000 00000006 Ok --
12 120ns
...... AD32 MRdMul 002D3000 00000007 Ok --
13 30ns
...... AD32 MRdMul 002D3000 00000008 Ok --
14 30ns
...... AD32 MRdMul 002D3000 00000009 Ok --
15 30ns
...... AD32 MRdMul 002D3000 0000000A Ok --
16 30ns
...... AD32 MRdMul 002D3000 0000000B Ok --
17 210ns
60ns AD32 MemRd FFBDFC3C 0000D0A0 Ok
-- polling
18 510ns
450ns AD32 MRdMul 002D3030 0000000C Ok --
19 30ns
...... AD32 MRdMul 002D3030 0000000D Ok --
20 30ns
...... AD32 MRdMul 002D3030 0000000E Ok --
21 60ns
...... AD32 MRdMul 002D3030 0000000F Ok --
22 30ns
...... AD32 MRdMul 002D3030 00000010 Ok --
23 30ns
...... AD32 MRdMul 002D3030 00000011 Ok --
24 30ns
...... AD32 MRdMul 002D3030 00000012 Ok --
25 30ns
...... AD32 MRdMul 002D3030 00000013 Ok --
26 30ns
...... AD32 MRdMul 002D3030 00000014 Ok --
27 30ns
...... AD32 MRdMul 002D3030 00000015 Ok --
28 150ns
60ns AD32 MemRd FFBDFC3C 0000D0A0 Ok
-- polling
29 480ns
420ns AD32 MRdMul 002D3058 00000016 Ok --
30 90ns
...... AD32 MRdMul 002D3058 00000017 Ok --
31 30ns
...... AD32 MRdMul 002D3058 00000018 Ok --
32 30ns
...... AD32 MRdMul 002D3058 00000019 Ok --
33 30ns
...... AD32 MRdMul 002D3058 0000001A Ok --
34 30ns
...... AD32 MRdMul 002D3058 0000001B Ok --
35 30ns
...... AD32 MRdMul 002D3058 0000001C Ok --
36 30ns
...... AD32 MRdMul 002D3058 0000001D Ok --
37 30ns
...... AD32 MRdMul 002D3058 0000001E Ok --
38 30ns
...... AD32 MRdMul 002D3058 0000001F Ok --
39 540ns
450ns AD32 MRdMul 002D3080 00000020 Ok --
40 30ns
...... AD32 MRdMul 002D3080 00000021 Ok --
41 30ns
...... AD32 MRdMul 002D3080 00000022 Ok --
42 30ns
...... AD32 MRdMul 002D3080 00000023 Ok --
Conclusions:
The preliminary results are leading to the following conclusions:
-
The asymptotic bandwidth does not depend on the approach and is about the
same for Linux and for Windows NT on the same PC. The lower value for the
PentiumII than for the PentiumPro could be due to the faster on-chip cache
memory of the PentiumPro.
-
Source NT driver can send (sink) up to ~35 MBytes/sec
-
For messages of 1 kByte ~15MBytes/sec is achieved
-
For 2 kBytes message the speed is almost doubled.
-
For shorter messages the performance degrades quickly with the message
length
-
The destination driver seems to be able to receive everything the sender
is sending
-
Varying the design of and optimizations within the driver seems to have
only an insignifcant impact on the performance of the driver. The more
signifficant is the impact of the host bridge.
-
Memory-mapped approaches (Linux) have less overhead per message and give
better results for shorter messages.
The performance is lower than that reported by the LymxOS group. At a length
of 1 kByte we observe about 30% of the bandwith reported( see [2] ). We
are trying to develop a better understanding of the results, also with
the help of the MicroEnable board of Silicon Software
([3])
References:
[1] S.Luitz - Linux approach towards PC solutions of destination S-link
based on the PCI controller S5933 of AMCC - software handed by e-mail.
[2] M.Niculescu S-LINK Performance measurements in the environment
of ATLAS DAQ/EF prototye -1
http://atddoc.cer.ch/Atlas/Notes/070/Note070-1.html
[3] MicroEnble - user manual & hardware applet microSlate, SiliconSoftware,Jan
98.
http://www.silicon-software.com