This description relates to managing data feeds.
A data feed provides a set of data units that have a well-defined order and are transmitted sequentially in that order on a substantially regular basis. The data units may be transmitted over a network such that the data units are broadcast to multiple nodes in the network. Certain data sources output real-time broadcast data feeds of ordered data units. An example of such a real-time broadcast data feed is a time series. This data feed might contain, for example, the price of a commodity at successive times.
A node in a network can capture a data feed and store it so that when a client needs a selected portion of the data, the node can retrieve it from storage and provide it to the client. There may be certain requirements that the node must satisfy when managing the captured data. For example, one set of requirements is that the data be available all the time, and that no data be lost.
A difficulty that arises is that a node may fail to capture and store some of the data feed. When this happens to a real-time broadcast data feed from a data source that is configured to only broadcast the data feed in real-time (i.e., without re-transmission), the missing data is lost forever to that node and its clients.
This failure can happen, for example, either because the node temporarily loses its network connection, or because the node becomes inoperative or runs out of buffer capacity. When this happens, the node may fail to capture and store some of the data in the data feed. Therefore, when a client asks the node for a particular portion of the data feed, if that portion happens to span a time during which the node was unable to capture and store data from the data feed, the node will be unable to fulfill the request.