The invention relates to a data processing system and method for distributing messages received in streams of data packets to software consumers.
There are an increasing number of applications that require streams of network messages to be consumed and processed at very high data rates. For example, electronic financial exchanges typically provide several data feeds comprising messages carrying market parameters such as security buy/sell values or trade orders. Each of these feeds can have data rates that currently peak at over two million messages per second and present a considerable processing challenge to a computer system configured to receive the feeds, such as a bank trading system. Other examples of receivers that are required to process message streams at high data rates include servers hosting databases, file caches and webservers.
Generally, the size of each message in such systems will be relatively small compared to the size of data packets in which data is carried over a network to the receivers. Many messages are therefore packed into each data packet in order to maximise the efficiency of message transmission. This requires the receiver to parse the streams of data packets it receives in order to identify each message and pass it on to the appropriate consumer at the receiver.
Typically, the consumers running in software at a computer system configured to receive such data streams will each require only a subset of the messages contained in the streams. For example, trading software at a bank computer system would typically be configured to normalise and re-publish, or trade based on a limited number of securities.
So as to not overload the consumers with irrelevant messages, a dispatcher process is required to parse each data packet stream to identify the messages and then forward on to each consumer only those messages that are required. However, because of the sequential nature of parsing an individual data stream, the dispatcher for a given data-feed will be a process thread running at a single core and is therefore the limiting factor on the speed of the overall system. In other words, whilst the consumer software might be designed to efficiently distribute its processing across multiple cores of the system by executing threads in parallel, the dispatcher and hence the consumer software it serves remains limited by the speed of the core at which the dispatcher is running. This problem is more generally known as Amdahl's Law, which expresses that the speed of a parallelised computation is bound by the speed of the sequential portion of the computation. Even where the streams are provided as separate data-feeds (e.g., on separate IP multicast addresses) and therefore multiple dispatcher threads may be operating in parallel on some of the cores of the computer system, ultimately processing will be bottlenecked.
Furthermore, the bottleneck resulting from the use of a software dispatcher is generally aggravated by the fact that all of the messages of a data stream are provided to the host for handling at the dispatcher process. In most cases however, not all of the messages of a data stream are required by the consumers of the host and the dispatcher process therefore wastes resources and increases latency in the system by handling messages that are not wanted by the host.
There is therefore a need for an improved method for handling streams of data packets at a data processing system.