Computer databases have become extremely sophisticated, e.g., the computing demands placed on database systems have increased at a rapid pace. Database systems are typically configured to separate the process of storing data from accessing, manipulating or using data stored in the database. More specifically, databases use a model where data is first stored, then indexed, and then queried. However, this model cannot meet the performance requirements of some real-time applications. For example, the rate at which a database system can receive and store incoming data can limit how much data can be processed or otherwise evaluated, which, in turn, limits the utility of database applications configured to process large amounts of data in real-time.
To address this issue, stream based computing and stream based database computing is emerging as a developing technology for database systems, where products are available and allow users to create applications that process and query streaming data before it reaches a database file. With this emerging technology, users can specify processing logic to apply to inbound data records while they are “in flight,” with the results available in milliseconds. Constructing an application using this type of processing has opened up a new programming paradigm that will allow for a broad variety of innovative applications, systems and processes to be developed as well as present new challenges for application programmers and database developers.
Measuring performance of a stream-based application enables one to determine whether the stream-based application is operating in an optimized manner. In a stream-based application, “tuples” of data are received via a data stream and are routed across processing elements (PEs) that perform operations on the tuples and then forward the tuples to a different processing element for further processing. One technique of measuring performance of a stream-based application involves determining a throughput of the tuples received via the datastream, e.g., a particular stream-based application may be considered to be efficient when one hundred tuples per minute are fully processed by one or more PEs. Unfortunately, this metric alone cannot be used to indicate whether the stream-based application is running in an optimized manner since various conditions affect the rate at which tuples arrive in the data stream. For example, the number of tuples received via a Really Simple Syndication (RSS) feed (i.e., a data stream) varies according to the time of day since news articles are often generated more frequently at particular times of the day, e.g., in the morning and in the evening. As a result, a user might be falsely alerted that the stream-based application is experiencing performance issues even when the stream-based application is operating in an optimized manner.