In a stream-based application, data is collected and pushed in real-time to the system in the form of a stream. Examples of stream-based applications include sensor monitoring applications, such as alarm systems and weather monitoring systems, financial applications, and the like. In stream-based applications, large amounts of information are collected, often from remote sites. These data streams are typically received as a real time data stream into a memory cache of a processing node or server.
A data stream can be viewed as consisting of a sequence of units called “ticks.” Each unit tick carries some basic data describing itself and a timestamp identifying when the tick occurred. Each tick is associated with a symbol and there can be ticks for many symbols in a data stream. A symbol will have a name, and aggregate data is associated with the symbol. For example, aggregate data can indicate the number of ticks for that symbol, and the maximum, minimum and sum of data in the ticks for the symbol. The aggregated data for a symbol is also called metadata.
As an example, a stream-based application can be used for collecting stock trading information. Here, a tick may represent a trade of a number of shares of a stock at a particular price and at a particular time. The symbol is the stock that is named by its ticker symbol. Associated with the symbol is metadata, which contains the last price at which the stock traded, the maximum and minimum price at which it traded, the sum of the number of shares traded and the number of trades, which is also the total number of ticks received for that symbol. Given the nature of stock trading, the data stream is transmitted at a high rate and contains a large amount of information.
However, the failure of a single node or server can significantly disrupt a stream-based application, such as a stock trading application. For example, in the event of a failure, the data in memory is volatile and a system crash could cause all of the data in that node to be lost. Therefore, many stream-based applications may provide a high-availability feature that allows the stream processing to continue even in the event of a single node failure.
Unfortunately, the known techniques for implementing high availability are ill suited for stream-based applications. As noted, many stream-based applications must handle high data rates and support large amounts of information that must be updated in real-time. Many of the-known techniques are unable to continue receiving and processing data-streams while recovering from a failure. In addition, many of the known techniques rely on transaction based updates, which are too slow for the pace of tick based updates found in data-stream applications.
Accordingly, it may be desirable to provide methods and systems for a high availability configuration able to continue receiving and processing data streams even in the event of a failure. In addition, it may be desirable to provide methods and systems that can perform this backup or failsafe function in a manner that is easily implemented into existing systems.