Copyright 2016 Society of Photo-Optical Instrumentation Engineers. This paper was published in Proceedings of SPIE (Proc. SPIE Vol. 10445, 1044549, DOI: http://dx.doi.org/10.1117/12.2280938) and is made available as an electronic reprint (preprint) with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, duplication of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.

# Selection of hardware platform for CBM Common Readout Interface

Wojciech M. Zabołotny<sup>a</sup>, Grzegorz H. Kasprowicz<sup>a</sup>, Adrian P. Byszuk<sup>a</sup>, David Emschermann<sup>b</sup>, Marek Gumiński<sup>a</sup>, Krzysztof T. Poźniak<sup>a</sup>, and Ryszard Romaniuk<sup>a</sup>

<sup>a</sup>Institute of Electronic Systems, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warszawa, Poland

<sup>b</sup>GSI-Helmholtzzentrum für Schwerionenforschung GmbH, Planckstraße 1, 64291 Darmstadt, Germany

## ABSTRACT

The implementation of the CBM readout and DAQ chain is currently reconsidered. The proposed changes include replacement of two sets of FPGA-based boards with a single one: Common Readout Interface (CRI) board. The paper presents the analysis performed to select the optimal hardware platform for the CBM CRI, considering the cost and the number of input links serviced by a single board.

Keywords: FAIR, CBM, STS, readout chain, FPGA, DAQ

### **1. INTRODUCTION**

The Compressed Baryonic Matter (CBM) experiment is being prepared at the new accelerator complex FAIR in Darmstadt. It is a fixed-target experiment aimed at analysis of high-energy nucleus collision, allowing exploration of the QCD phase diagram in the region of high baryon densities.<sup>1,2</sup>

The products of collisions will be recorded by multiple particle detectors: Micro Vertex Detector (MVD), Silicon Tracking System (STS), Ring Imaging Cherenkov Detector (RICH), Muon Chamber (MUCH), Transition Radiation Detector (TRD), Time of Flight Detector (TOF), Projectile Spectator Detector (PSD).

The CBM readout chain is a distributed system connecting the detectors' Front End Boards (FEBs) with the Data Acquisition System. The electronics controlling the FEBs and receiving the measurement data will be placed in the service building, located in the vicinity of the detector. In the case of STS and MUCH readout, the FEBs will be equipped with STS/MUCH-XYTER2 (SMX2)<sup>3</sup> ASICS, connected via copper links to the Common Readout Boards (CROBs).<sup>4</sup> The CROBs implement a bridge between the SMX2 links and the GBT<sup>5,6</sup> links used to transmit the control signals and the measurement data to the CBM service building. The data received from CROBs should be here concentrated and transmitted via long distance (ca.700 m) links to the First Level Event Selector (FLES) computer grid located in the computer center. The first proposed solution assumed using two different kinds of FPGA-based boards. The first of them - Data Processing Boards (DPB)<sup>7,8</sup> should control FEBs and receive data via the GBT links, and send concentrated data via higher-speed long-distance optical links. The second kind of boards, the FLES Interface Boards (FLIB)<sup>9,10</sup> will receive the data in the form of so called microslices, and transfer them via the PCIe interface into the memory of the FLES input nodes. Further processing, and building of so called time slices is performed by the InfiniBand interconnected FLES nodes. The block diagram of the readout chain based on that concept is shown in Fig. 1.

However, last years the tendency of using the computers in the earlier stages of data acquisition chain is visible. That approach has been proposed for new upgrades of the LHCb<sup>11–13</sup> and ATLAS<sup>14</sup> detectors at LHC. Similar solution has also been proposed for CBM. In this approach,<sup>15</sup> the functionalities of the DPB and FLIB are combined in a single PCIe board, further denoted as Common Readout Interface. That solution reduces the number of specialized boards and allows to use the standard high-speed network interfaces (e.g. InfiniBand) for long distance links. This approach also allows to better utilize the technological progress. The fact that the CRI boards must be designed and manufactured earlier, does not prevent us from using the newest achievable technology for the long-distance links. Usage of standard network adapters may also further reduce costs. The block diagram of the modified DAQ chain is shown in Fig. 2.

Further author information: (Send correspondence to W.M.Z.)

W.M.Z.: E-mail: wzab@ise.pw.edu.pl, Telephone: +48 22 234 7717

G.H.K.: E-mail: gkasprow@elka.pw.edu.pl, Telephone: +48 22 234 7729



Figure 1: The first proposition of the CBM readout chain based on separate DPB and FLIB boards (based on [7]).

The design of the CRI boards, however, requires thorough checking of the available technology. The preliminary analyses were shortly described in [16].

### 2. REQUIREMENTS FOR THE CRI BOARD

The CRI board should be connected to the readout boards via GBT links. The readout board proposed for CBM<sup>4</sup> provides three GBT links – one duplex and two simplex. Therefore to support *N* readout boards, the CRI should use  $3 \cdot N$  multigigabit transceivers supporting the 4.8 Gb/s bitrate.

To transmit the concentrated data to the host PC, the CRI should also provide multi-gigabit transceivers able to implement the PCIe interface. Availability of the hardware PCIe core is also desired. Both leading FPGA vendors - Xilinx<sup>17</sup> and Altera<sup>18</sup> offer FPGAs with hardware implemented PCIe cores. However, Xilinx promises the hardware support for PCIe 4.0. Therefore the newest Xilinx family Ultrascale+<sup>19,20</sup> was selected for further evaluation.

The most important criteria for selection of the right FPGA chip is the availability of the sufficient number of multigigabit transceivers. Of course it would be good to handle as many CROBs as possible with a single board, but there are certain limiting factors.

- Bandwidth
- Number of logical resources must be sufficient to concentrate the data

#### 2.1 Available optical transceivers

Fitting multiple optical connections in a PCIe card may be a difficult task. Therefore miniature multichannel transceivers are needed. A quick search shows the following options (prices taken from Octopart<sup>21</sup>):

- QSFP+: Finisar FTL410QD2C<sup>22</sup> with approximate price of \$300 for 4 duplex channels.
- 12-channel Foxconn MiniPOD<sup>23</sup> modules with the approximate price of \$360 for simplex channels (either Tx or Rx).



Figure 2: Proposition of the modified readout chain with CRI (based on [15]).

- 12-channel Foxconn MicroPOD<sup>24</sup> modules with the approximate price of \$130 for 12 Tx channels and \$280 for 12 Rx channels.
- 24-channel Finisar FBOTD10SL1C00<sup>25</sup> with the approximate price of \$860 for 24 duplex channels. The RX only FBRTP10SL1C00 modules are also available.

It seems that the best cost per channel may be achieved with the 12-channel MicroPOD modules. The smallest configurations with a reasonable utilization of optical channels are the following:

- With two 12-channel receivers and one 12-channel transmitter, we can connect 8 CROBs to a single CRI. In this case, we waste 4 transmit channels (may be used as spares).
- With three 12-channel receivers and one 12-channel transmitter, we can connect 12 CROBs to a single CRI. In this case, we fully utilize optical channels.

### 2.2 Required PCIe bandwidth

The number of handled CROBs also defines the required PCIe bandwidth. In the pessimistic approach, we may consider that the required PCIe bandwidth will be just the sum of data bandwidth from all connected GBT links. In the "wide bus" mode used in the STS readout, each GBT link may deliver 4.48 Gb/s of data. That gives 13.44 Gb/s of data for each CROB, resulting in 108 Gb/s for 8 CROBs and 162 Gb/s for 12 CROBs.

The optimistic approach considers the format of data delivered by the STS FEE.<sup>26</sup> Each SMX2 link delivers up to 9.41 Mhits/s. In each hit, 20 bits of information cannot be aggregated. Assuming, that each CROB board may service up to 40 SMX2 links, we get the data bandwidth of 7.53 Gb/s from each CROB and finally 60.22 Gb/s for 8 CROBs and 90.34 Gb/s for 12 CROBs.

The PCIe 3.0 or 3.1 offers us 7.877 Gbit/s per lane, and UltraScale+ devices may use up to 16 lanes. So assuming the pessimistic assessment, we need 16 lanes for 8 CROBs (bandwidth 126 Gb/s - margin of 14%) and two links each with 16 lanes for 12 CROBs (bandwidth of 252 Gb/s - margin of 36%). Assuming the optimistic assessment, a single link with

8 lanes should be sufficient for 8 CROBs (bandwidth of 63 Gb/s) but the margin of 4% is probably too low, and therefore anyway the 16-lanes link will be needed. The same link will be needed in case of 12 CROBs (margin of 28%).

The PCIe 4.0 offers us 15.752 Gb/s per lane, and UltraScale+ devices may use up to 8 lanes. The similar analyzis gives us for pessimistic assessment one 8-lanes link for 8 CROBs and two 8-lanes links for 12 CROBs, and for optimistic assessment - one 8-lanes link both for 8 and 12 CROBs.

The above results are summarized in table 1 and in table 2.

Table 1: Required number of multi-gigabit transceivers for different interfaces depending on bandwidth assessment and PCIe version.

| # CROBs  | # GBTs | #lane | s PCIe 3 | #lanes PCIe 4 |       |  |
|----------|--------|-------|----------|---------------|-------|--|
|          |        | opt.  | pess.    | opt.          | pess. |  |
| 8 CROBs  | 24     | 16    | 16       | 8             | 8     |  |
| 12 CROBs | 36     | 16    | 32       | 8             | 16    |  |

Table 2: The total required number of multi-gigabit transceivers depending on bandwidth assessment and PCIe version. One multi-gigabit transceiver is added for a possible TFC connection.

| # CROBs  | PC   | Te 3  | PCIe 4 |       |  |
|----------|------|-------|--------|-------|--|
| # CRODS  | opt. | pess. | opt.   | pess. |  |
| 8 CROBs  | 41   | 41    | 33     | 33    |  |
| 12 CROBs | 53   | 69    | 45     | 53    |  |

The results of the comparison of the above requirements with the available Ultrascale+ chips<sup>19</sup> are shown in table 3. The Virtex chips were avoided to keep the cost of the CRI board as low as possible.

Table 3: Available chips and their resources. Prepared basing on [19]. Example prices found on Octopart.<sup>21</sup> Prices for chips in speedgrade "1" and temperature range "E", except XCKUP15P in enclosure FFVA 1760\*, where only the price for speedgrade "L2" was available. Assuming the same price ratio between "1" and "L2" parts as in the case of XCKUP15P in FFVE1760 enclosure, we may estimate the price of "1" speedgrade at ca. \$5000.

|                                      | KU11P         | KU15P         |                           | ZU11EG       | ZU17EG        |               | ZU19EG        |               |
|--------------------------------------|---------------|---------------|---------------------------|--------------|---------------|---------------|---------------|---------------|
| System Logic<br>Cells                | 653,100       | 1,143,450     |                           | 653,100      | 926,194       |               | 1,143,450     |               |
| Block RAM<br>Blocks                  | 600           | 984           |                           | 600          | 796           |               | 984           |               |
| UltraRAM<br>Blocks                   | 80            | 128           |                           | 80           | 102           |               | 128           |               |
| Enclosure                            | FFVE<br>1517  | FFVE<br>1517  | FFVA 1760                 | FFVC<br>1760 | FFVC<br>1760  | FFVD<br>1760  | FFVC<br>1760  | FFVD<br>1760  |
| Number of<br>transceivers<br>GTH+GTY | 32+20         | 32+24         | 44+32                     | 32+16        | 32+16         | 44+28         | 32+16         | 44+28         |
| Total number of transceivers         | 52            | 56            | 76                        | 48           | 48            | 72            | 48            | 72            |
| Example price                        | ca.<br>\$3500 | ca.<br>\$4600 | ca. \$7900*<br>(\$5000 ?) | ca. \$3900   | ca.<br>\$4400 | ca.<br>\$4900 | ca.<br>\$4900 | ca.<br>\$5400 |

The interesting result is that the Zynq UltraScale+ chips are available at prices similar to the comparable Kintex UltraScale+ chips. Therefore it is worth to consider if the CRI design may benefit from the availability of the additional quad-core 64-bit ARM CPU.

# 2.3 usability of ARM in the CRI

Usability of the ARM processor in the CRI is unclear. To fully utilize its potential e.g. for data preprocessing it would be necessary to equip the board with the DDR memory, which may complicate the PCB layout and increase the cost. Additionally, such utilization of the CPU would require that the transmitted data should be transferred through the DDR memory to make them available for the CPU. However, at the DDR4 speed of 2400MT/s (according to [27]), even the 64-bit wide memory does not allow to transfer the whole data stream to and from the memory (if we consider the necessary margin resulting from the fact, that this memory will be also used as a code memory). Therefore, it seems that the ARM processor may be used only with the in-chip 1 MB L2 cache memory "bare metal" configuration for control and diagnostic purposes<sup>\*</sup>.

# 2.4 Clocks and synchronization

The CROB modules (and further FEB boards) need precise synchronization both by delivery of dedicated clock signal as well as synchronous control commands. For this purpose, all CRI modules need to be synchronized together. There are several ways to achieve it:

- To embed clock recovery circuit and synchronization engine utilizing i.e. White Rabbit<sup>28, 29</sup> or other protocol on every CRI board. The clock is received from the optical signal from the SFP transceiver.
- To build a dedicated receiver board<sup>30</sup> or use standard WR node in the form of a PCIe card.<sup>31</sup> Then distribute clock and timing signals using pear-to-pear or daisychain connectors, i.e. popular and low-cost miniSAS.
- To deliver both clock and control messages using dedicated optical or copper links to each CRI board from external distribution unit. For example it may be possible to reuse the TTC-PON system developed at CERN.<sup>32,33</sup>

The selected FPGA chips are capable to support all the above options. However, the additional hardware resources (e.g. the clock jitter cleaner) must be included in the board design.

### 2.5 Requirements related to FPGA resources

The simplified analysis of the STS data<sup>34,35</sup> shows, that the critical logic resource may be the BRAM blocks, that will be used for sorting of data received from the SMX2 chips. Using the simple "stream merger" approach we need a single BRAM for each connected SMX2 link.<sup>3</sup> In the worst case, there may be up to 40 SMX2 links connected to a single CROB, resulting in the consumption of 320 BRAMs in the 8-CROB configuration and 480 BRAMs in the 12-CROB configuration. All the above chips offer significantly more BRAM blocks. Consumption of other FPGA resources will be evaluated after the creation of the prototype version of the CRI firmware, which should be possible after the GBT compatible version of the SMX2 controller is created.<sup>16</sup> The PCIe transmission of data may be accomplished either by the modified PCIe cores from the FLIB firmware,<sup>9,10</sup> or by a dedicated DMA block<sup>36</sup> based on a Xilinx XDMA IP-core.<sup>37</sup>

### **3. COST ANALYSIS**

The CRI board connected to 8 CROB boards may be implemented as a PCIe board with single PCIe 16xGen3 or 8xGen4 connector. In the case of CRI supporting 12 CROBs, if the optimistic bandwidth assessment can't be applied, it is necessary to equip board with two PCIe connectors - that may lead to the design and mechanical problems, and Flex PCB technology would have to be used in such case. However, it must be taken into consideration that due to MiniPOD or MicroPOD heatsinks it is unclear if even with a single PCIe connector, it will be possible to use all PCIe slots in the computer mainboard. The estimated cost per serviced CROB is (assuming \$130 for MicroPOD Tx and \$280 for MicroPOD Rx):

- $(2 \cdot \$280 + \$130 + \$3500) / 8CROBS \approx \$524$  in case of 8-CROB version with XCKU11P in FFVE1517 enclosure.
- $(3 \cdot \$280 + \$130 + \$4600) / 12CROBS \approx \$464$  in case of 12-CROB version with XCKU15P in FFVA1517 enclosure (optimistic assessment).

<sup>\*</sup>It may also be possible to connect a slower 32-bit LPDDR memory to the SoC. That should allow using Linux for control and diagnostic purposes without significant increase of PCB complexity.

•  $(3 \cdot \$280 + \$130 + \$5000) / 12CROBS \approx \$498$  in case of 12-CROB version with XCKU15P in FFVA1760 enclosure (pessimistic assessment).

It is worth to mention, that replacement of the Kintex UltraScale+ with the Zynq UltraScale+ does not change the cost significantly.

- $(2 \cdot \$280 + \$130 + \$3900) / 8CROBS \approx \$574$  in case of 8-CROB version with ZU11EG in FFVC1760 enclosure.
- $(3 \cdot \$280 + \$130 + \$4900)/12CROBS \approx \$489$  in case of 12-CROB version with ZU17EG in FFVD1760 enclosure.

In the case of the 8-CROB version, it should be possible to upgrade the board by mounting the KCKU15P part instead of KCKU11P in the same package (or ZU17EG instead of ZU11EG), if the amount of resources is insufficient. Similarly in the case of Zynq-based 12-CROB version, it is possible to slightly increase the number of logical resources by replacement of ZU17EG with ZU19EG.

The visualization of a possible CRI board in a PCIe format is shown in Fig. 3.



Figure 3: Visualization of a possible CRI board in the PCIe format. In the visualization, the MiniPOD transceivers have been used. The MicroPOD transceivers will be even smaller. The 12-CROB variant is shown with three Rx and one Tx MiniPODs. The SFP connector is supposed to be used for the TFC connection.

### 4. CONCLUSIONS

The performed preliminary analysis shows that it is possible to create the CRI board as a PCIe card, using the currently available Xilinx UltraScale+ or Zynq UltraScale+ chips. There are two possible solutions. One where a single CRI board services 8 CROB boards, and another where a single CRI board services 12 CROB boards. Both solutions seem to have a similar cost per a serviced CROB. The 12-CROB version offers sligthly lower (ca. 12%) cost, but it may pose more design and technological problems (a higher number of multi-gigabit links resulting in more complex PCB routing, a higher number of optical transceivers, probably two PCIe connectors). Therefore, the 8-CROB version seems to be a safer choice. Before the final decision is taken, it is necessary to negotiate the component prices with vendors, as the prices offered for a scientific project may be lower. Additionally, it is necessary to take into consideration the computer platform in which the CRI boards will be used, and results of PCB design and implementation of the prototype firmware.

### ACKNOWLEDGMENTS

The authors would like to thank CBM collaborators, especially Dr. W.F.J. Müller and Dr. Jörg Lehnert from GSI, and Dirk Hutter, and Dr. Jan de Cuveland from FIAS for their cooperation and sharing the ideas used in that work.

## REFERENCES

- [1] Ablyazimov, T., Abuhoza, A., Adak, R. P., and etal, "Challenges in qcd matter physics –the scientific programme of the compressed baryonic matter experiment at fair," *The European Physical Journal A* **53**(3), 60 (2017).
- [2] "CBM The Compressed Baryonic Matter experiment." http://www.fair-center.eu/for-users/ experiments/cbm.html.
- [3] Kasinski, K., Szczygiel, R., and Zabolotny, W., "Back-end and interface implementation of the sts-xyter2 prototype asic for the cbm experiment," *Journal of Instrumentation* **11**(11), C11018 (2016).
- [4] Lehnert, J., Byszuk, A., Emschermann, D., Kasinski, K., Müller, W., Schmidt, C., Szczygiel, R., and Zabolotny, W., "GBT based readout in the CBM experiment," *Journal of Instrumentation* **12**(02), C02061 (2017).
- [5] Moreira, P. et al., "The gbt-serdes asic prototype," Journal of Instrumentation 5(11), C11022 (2010).
- [6] Moreira, P. R. S., "The radiation hard GBTX link interface chip," (Nov. 2013). https://indico.cern.ch/event/ 267408/.
- [7] Zabołotny, W. M. and Kasprowicz, G., "Data processing boards design for CBM experiment," *Proc. SPIE* **9290**, 929023–929023–11 (2014).
- [8] Zabolotny, W., Kasprowicz, G., Byszuk, A., Emschermann, D., Gumiński, M., Juszczyk, B., Lehnert, J., Müller, W., Poźniak, K., and Romaniuk, R., "Versatile prototyping platform for data processing boards for cbm experiment," *Journal of Instrumentation* 11(02), C02031 (2016).
- [9] Hutter, D., de Cuveland Jan, and Volker, L., "CBM FLES input interface developments," in [CBM Progress Report 2015], Friese, V., Sturm, C., and Toia, A., eds., 112, GSI, Darmstadt (2016). http://repository.gsi.de/record/ 186952.
- [10] Hutter, D., de Cuveland Jan, and Volker, L., "Evaluation of the CBM FLES demonstrator," in [CBM Progress Report 2016], Selyuzhenkov, I. and Toia, A., eds., 157, GSI, Darmstadt (2017). http://repository.gsi.de/record/201318/.
- [11] Cachemiche, J.-P., Duval, P.-Y., Hachon, F., Gac, R. L., and Marin, F., "Study for the lhcb upgrade read-out board," *Journal of Instrumentation* **5**(12), C12036 (2010).
- [12] Bellato, M., Collazuol, G., D'Antone, I., Durante, P., Galli, D., Jost, B., Lax, I., Liu, G., Marconi, U., Neufeld, N., Schwemmer, R., and Vagnoni, V., "A pcie gen3 based readout for the lhcb upgrade," *Journal of Physics: Conference Series* 513(1), 012023 (2014).
- [13] Cachemiche, J., Duval, P., Hachon, F., Gac, R. L., and Réthoré, F., "The pcie-based readout system for the lhcb experiment," *Journal of Instrumentation* **11**(02), P02013 (2016).
- [14] Anderson, J., Bauer, K., Borga, A., Boterenbrood, H., Chen, H., Chen, K., Drake, G., Dönszelmann, M., Francis, D., Guest, D., Gorini, B., Joos, M., Lanni, F., Miotto, G. L., Levinson, L., Narevicius, J., Vazquez, W. P., Roich, A., Ryu, S., Schreuder, F., Schumacher, J., Vandelli, W., Vermeulen, J., Whiteson, D., Wu, W., and Zhang, J., "Felix: a pcie based high-throughput approach for interfacing front-end and trigger electronics in the atlas upgrade framework," *Journal of Instrumentation* 11(12), C12023 (2016).
- [15] Emschermann, D., "The CBM STS DAQ chain." Presentation on the CBM ASIC workshop, Darjeeling 2017 (2017).
- [16] Zabołotny, W., Kasprowicz, G., Byszuk, A., Gumiński, M., Poźniak, K., and Ryszard, R., "Towards the common readout interface boards for STS," in [*CBM Progress Report 2016*], Selyuzhenkov, I. and Toia, A., eds., 150–151, GSI, Darmstadt (2017). http://repository.gsi.de/record/201318/.
- [17] Xilinx, "PCI Express (PCIe)." https://www.xilinx.com/products/technology/pci-express.html.
- [18] Altera, "PCI Express hard IP." https://www.altera.com/solutions/technology/transceiver/protocols/ pro-hard-ip.html.
- [19] Xilinx, "UltraScale architecture and product data sheet: Overview." https://www.xilinx.com/support/ documentation/data\_sheets/ds890-ultrascale-overview.pdf.
- [20] Xilinx, "UltraScale+ FPGA product tables and product selection guide." https://www.xilinx.com/support/ documentation/selection-guides/ultrascale-plus-fpga-product-selection-guide.pdf.

- [21] Octopart, "Datasheets, electronic parts, components, search Octopart." https://octopart.com/.
- [22] Finisar, "40GBASE-SR4/4x10GBASE-SR 300m Gen2 QSFP+ Optical Transceiver FTL410QD2C." https://www. finisar.com/optical-transceivers/ftl410qd2c.
- [23] Foxconn, "MiniPOD Embedded Optical Modules." http://www.fit-foxconn.com/Product/SearchByFamily? topClassID=Electronic%20Module&ProductClass=Fiber%200ptics&ProductFamily=Embedded%200ptical %20Modules&ProductSeries=MiniPOD%20Embedded%200ptical%20Modules.
- [24] Foxconn, "MicroPOD Embedded Optical Modules." http://www.fit-foxconn.com/Product/SearchByFamily? topClassID=Electronic%20Module&ProductClass=Fiber%200ptics&ProductFamily=Embedded%200ptical %20Modules&ProductSeries=MicroPOD%20Embedded%200ptical%20Modules.
- [25] Finisar, "10G BOA (Board-Mount Optical Assembly)." https://www.finisar.com/optical-engines/ fbotd10s11c00.
- [26] Kasinski, K., Szczygiel, R., Zabolotny, W., Lehnert, J., Schmidt, C., and Müller, W., "A protocol for hit and control synchronous transfer for the front-end electronics at the {CBM} experiment," *Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment* 835, 66 – 73 (2016).
- [27] Xilinx, "Zynq UltraScale+ MPSoC Data Sheet: Overview." https://www.xilinx.com/support/ documentation/data\_sheets/ds891-zynq-ultrascale-plus-overview.pdf.
- [28] "The White Rabbit Project," (2013). http://www.ohwr.org/attachments/2528/IBIC2013\_WR.pdf.
- [29] Cota, E. G., Lipiński, M., Włostowski, T., van der Bij, E., and Serrano, J., "White Rabbit specification: Draft for comments." http://www.ohwr.org/attachments/1169/WhiteRabbitSpec.v2.0.pdf.
- [30] Meder, L., Dreschmann, M., Sander, O., and Becker, J., "A signal distribution board for the timing and fast control master of the cbm experiment," *Journal of Instrumentation* **11**(02), C02001 (2016).
- [31] "Simple PCIe FMC carrier (SPEC)." http://www.ohwr.org/projects/spec.
- [32] Baron, S. and Mendes, E., "Recent Developments in the TTC PON." https://indico.cern.ch/event/465344/ attachments/1281633/1904473/TTC-PON\_ESE\_seminar\_31\_05\_16\_.pdf.
- [33] Mendes, E., Baron, S., Kolotouros, D., Soos, C., and Vasey, F., "The 10g ttc-pon: challenges, solutions and performance," *Journal of Instrumentation* 12(02), C02041 (2017).
- [34] Zabołotny, W. M., "Implementation of the STS readout concentrator in the newest FPGA technology Preliminary Study." Presentation on the CBM Collaboration Meeting, GSI Darmstadt 2016 (2016).
- [35] Zabołotny, W. M., "PCIe & GBT-FPGA based Common Readout Interface (CRI) for STS." Presentation on the CBM Collaboration Meeting, GSI Darmstadt 2017 (2017).
- [36] Zabołotny, W. M., "DMA implementations for FPGA-based data aquisition systems." this volume.
- [37] Xilinx, "DMA for PCI Express (PCIe) Subsystem." https://www.xilinx.com/products/ intellectual-property/pcie-dma.html.