Time-based data from a real-time feed usually arrives in chronological order. One example of real-time feed of data would be financial trade information. User's software accepts the data in feed format, maintains it in memory for manipulation and queries from the user, then loads it into a database. The user expects to view the data on a time-ordered basis according to time of occurrence or time stamping. In addition, the viewer expects to have the data available to queries immediately after receiving the data stream.
Occasionally, however, an out-of-order data point may arrive. Out-of-order data may occur because a single data element comes out of place. Data may also arrive out-of-order if the data must be “replayed” from the data stream for some duration of time. The data “replay” might be required when the system is unable to handle the data as it was received, i.e. the data feed system is down.
The occurrence of an out-of-order data point is problematic when treating these feeds as ordered streams of data. Typically, the data is being received from the feed at such a high rate that inserting an out-of-order data element in the correct ordered position in the stream is not possible. In prior systems, any out-of-order point was discarded, i.e., not stored in the database. This presents the disadvantage of losing data. In addition, any attempt to insert the out-of-order point reduced the system's ability to respond to the data feed, slowing the system down. When the system slows down, data is lost because the system cannot keep up with the data feed. Either way, data is lost.
Current real-time loaders generally identify this problem and address it in a manner that would likely result in lost data and increased resource usage. These loaders discard out-of-order data elements. The incoming data stream is separated into entities; for example, each stock in a stream of stock trades is a separate entity. When data must be “replayed,” duplicate entities with their own ordered lists are created, consuming additional memory resources. These separate “replay” entities still discard any out-of-order data elements. Attempts have been made to insert out-of-order data elements in the correct ordered positions but this requires too much processing resources, rendering it difficult to keep up with ingesting all the data sent in the feed.
What is therefore needed is a system for handling out-of-order data that neither discards data elements nor slows data stream processing. The need for such a system and associated method has heretofore remained unsatisfied.