In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
Modern computer systems may be used to support a variety of applications, but one common use is the maintenance of large relational databases, from which information may be obtained. A large relational database is often accessible to multiple users via a network, any one of whom may query the database for information and/or update data in the database.
Database systems are typically configured to separate the process of storing data from accessing, manipulating, or using data stored in a database. More specifically, database systems use a model in which data is first stored and indexed in a memory before subsequent querying and analysis. In general, database systems may not be well suited for performing real-time processing and analyzing streaming data. In particular, database systems may be unable to store, index, and analyze large amounts of streaming data efficiently or in real time.
Stream-based computing, also called data streaming, has been used to more effectively handle large volumes of incoming data in real time. In a data streaming application, data moves through a connected network of “processing elements” called a “graph”, each processing element performing some function or functions with respect to the data.
Stream-based computing works on a paradigm in which all the data is live as it moves through the operator graph. In accordance with this paradigm, each processing element in the graph has all the data needed to perform its function at hand, and can do so sufficiently rapidly to maintain a high rate of data flow through the graph. However, a processing element sometimes needs to access data externally, i.e., either in storage or a remote database, an event sometimes referred to as a lookup operation. When this happens, the processing element must wait while the necessary data is retrieved. Such waits can substantially degrade the performance of the streaming application. Often, the wait has a ripple effect through the operator graph, causing other processing elements to wait unnecessarily for data and/or data to back up in various buffers of the stream application.
A need exists for improved techniques for managing large data flows, and in particular, for improved data streaming techniques which manage data lookup operations.