As there is a growing need for faster processing of large volumes of data in financial industries, data processing systems based on clusters relying on general-purpose CPUs show a number of limitations. Indeed, if cluster approaches involve inexpensive hardware and provide tools that simplify the development, they have a number of constraints which are all the more significant as the requirement for high performance computing increases: high electricity consumption, costly maintenance, important space required for data centers. Further, the overall performance obtained with a cluster does not increase proportionally with the number of clusters. Unlike the cluster approach, data processing systems based on FPGAs allows execution of complex tasks in parallel with an important throughput, with a limited number of machines equipped with FPGAs. Accordingly, this hardware approach appears particularly suitable for the development of applications in the field of financial and investment industries where fast calculation is key to remain competitive.
An FPGA (acronym for Field-programmable gate array) designates an integrated circuit which can be configured after manufacturing. The configuration is generally specified using a “Hardware description language” (HDL). FPGAs contain a huge number of programmable logic components (“logic blocks”), and a hierarchy of reconfigurable interconnections that allow the blocks to be “wired together”. Logic blocks can be configured to perform complex combinational logic or merely simple basic logical operations (boolean AND, OR, NAND, XOR etc.). As FPGA can perform parallel calculations, a same algorithm can be executed simultaneously for a number of independent inputs in only a few clock cycles. FPGAs are thus particularly suited for executing complex computation very fast.
For these reasons, more and more market data processing systems are designed using FPGAs.
Existing market data processing systems receive data from external sources (such as Exchanges), publish financial data of interest to their subscribers (such as traders at workstations), and route trade data to various exchanges or other venues.
They generally comprise at least one decoder that interacts with the feed sources for handling real time data streams in a given format (FAST, FIX, binary), and decodes them, converting the data streams from source-specific formats into an internal format (data normalization process). According to the message structure in each data feed, the decoder processes each field value with a specified operation, fills in the missing data with value and state of its cached records, and maps it to the format used by the system.
Currently, the decoding of input data streams is performed in software or in hardware, in a purely sequential way, without any parallelization. Existing decoders which perform the decoding in software often undergo bandwidth limitation as the processor of the decoder cannot decode the packets fast enough. This stems from the fact that the software decoder needs to decode every message to determine if it concerns an instrument that is of interest to the application(s). Furthermore, when doing the rest of the processing in hardware, two transfers, from the hardware to the software and the other way around are required. These transfers are very time consuming compared to the typical processing time, and add a lot of latency.
Market data rates have dramatically increased over the past few years, approaching a peak of 1 million messages per second. As market data rates continue to increase, high speed, ultra low latency, and reliable market data processing systems are becoming increasingly critical to the success of the financial institutions. In particular, there is currently a need to provide high-performance decoders capable of processing up to 10 Gb/s market data feeds to feed the order management core with normalized commands that do not depend on the market being processed, while still having the lowest latency possible.
Further, the market data formats evolve quite often, especially those in FAST. This does not raise any major issue for classic software decoders, which can usually be modified easily. In the case of FAST formats, the exchange provides the updated templates file, and the software either loads this file dynamically, or its code (or a part of it) is regenerated automatically from these templates.
However, with decoders using reconfigurable platforms (FPGA), it is difficult to adapt to such format changes in an efficient way. Indeed, while a general-purpose CPU can be easily updated to execute any task, once an FPGA is programmed for a particular task, it is quite complicated to update the FPGA so that it can to execute another task. This would require reprogramming the FPGA again, which is both expensive and complex.