There are over 50,000 securities trading in the United States. Every second, up to 100,000 quotes and trades are generated. As shown in FIG. 1, most of the quotes and trades occur soon after trading begins (100), and before the close of trading (105). These quotes and trades are then distributed to a variety of organizations within the financial services industry.
FIG. 2 is a flow diagram that illustrates a system for market data distribution. This distribution mechanism can be viewed as a continuous data stream of quotes (bids and asks) (200) delivered by a feed handler 235 to multiple consumer applications (205, 210, 215), where each consumer application includes logic to receive the data stream containing encapsulated data (220), decode the data stream (225), and filter the contents of the data stream (230). In this model, market data is viewed as a monotonic stream of time-series data. This data stream 200 is treated as a distributed data resource similar to a relational database table. The data stream 200 may be expressed in terms of its logical data model (i.e. the data layout or structure of the stream 200). The stream 200 itself is named based on a partitioning scheme and represented as a Universal Resource Identifier (URI).
FIG. 3 is a detailed flow diagram that illustrates a system for market data distribution. FIG. 3 provides more detail for FIG. 2. As shown in FIG. 3, a feed handler 310 receives encapsulated market data from a data source 300 via a first network 308. Network stack 305 de-encapsulates the market data for use by the feed handler 310. A publisher 315 publishes the market data to a second network 325, using network stack 320 to encapsulate the published market data according to a network protocol. Each of multiple consumers (335, 345, 355, 360, 375, 385, 395, 398) is associated with a network stack for de-encapsulating the encapsulated published market data. Each of the multiple consumers (335, 345, 355, 360, 375, 385, 395, 398) also includes logic to filter the de-encapsulated published market data and further process the published market data that passes the filter.
Consumers of the data streams described above benefit from being able to receive and process the data streams as fast as possible. In “programmatic” trading applications, this means that there is an economic advantage in receiving the data in real-time. In this context, the term “real-time” means as close to zero-latency as possible. In these scenarios, the value of quote/trade data increases as the time it takes to be delivered from its source to its destination decreases.
Latency is introduced in networks in many ways. Since the speed of communications is ultimately limited by the speed of light, the physical distance that a message must travel affects latency. Also, any processing done on the message affects latency. Such processing may be performed by, for example, switches, routers, firewalls, etc. Processing done at the message source and the message destination hosts also affects latency. This processing includes protocol overhead and transmission time, buffer copies, context switches, and synchronization.
Traditional reflective methods for data distribution typically consume data, apply some process, and then trigger an action. Such methods require that potential candidates for input into the process must be delivered before the process can begin. Thus, the process can be viewed as discontinuous and discrete.
Internet Protocol (IP) Multicast is designed to provide support for wide area distribution of streaming data. Originally designed for delivery of video streams over the Internet, the technology has been widely applied in a diverse set of industries, including Energy, Financial Services, etc. Within the financial services industry specifically, use of IP multicast for the distribution of pricing data being published out of a variety of markets is pervasive. This has been driven primarily by the need to deliver this data to individual desktops where end users use it as input to a variety of analytical models running in a spreadsheet. Recently, a confluence of trends has started to erode the value of this approach. Driven by price/performance considerations, many automated analytical and trading applications that have traditionally run on a Sun/SPARC computer platform have started to migrate to an Intel/AMD computer platform. The scale-out model inherent in this architecture lends itself to parallelism achieved by fanning out the distribution of tasks, not data, across many systems. IP Multicast, in contrast, distributes the same data to every “listener” on the channel. Filtering this data requires some type of intermediate, in-line content inspection mechanism.
Stochastic methods for data distribution have been gaining in popularity. In stochastic methods, the process is continuous with portions of the requisite computation migrating towards the source of the data. These applications are simple queries that are applied against a known data model. Such applications represent a form of communication where software agents such as a long running Monte Carlo simulation hosted on a distributed, HPC cluster or interactive spreadsheets, interact with a dynamic, “live” system. This type of communication allows for evolutionary control within time-critical environments.
Beowulf is a concept of clustering commodity computers to form a parallel, virtual supercomputer. The communications subsystem is the clustering technology that harnesses the computing power of a collection of computer systems and transforms them into a high-performance cluster. The combination of the physical interconnection, the communications protocol, and the message passing interface comprises the communications subsystem. It allows the processes of a parallel application to exchange messages during their collaborative execution.
Current networking is based on Ethernet and Wide-Area/Internet distribution assumptions. These assumptions include that consumers of data are spread over large areas, and are characterized by open loop control. Cluster-based computing, characterized by high-density servers and message passing, invalidates many of these Ethernet and Wide-Area assumptions. This is because in cluster-based computing, consumers of data are centralized and located near the data source and are characterized by closed loop control.
Accordingly, a need exists in the art for solution that provides relatively less latency and relatively high throughput access to data. A further need exists for such a solution that provides access to market data. A further need exists for such a solution that provides a utility execution environment for access to “tick” data. Yet a further need exists for such a solution that facilitates relatively fast turnaround for analytical programs that consume the data.