Streaming applications operate on input data which is not retrieved from persistent storage, but which arrives as one or more continuous sequence of items. Such input data might be streaming media such as streaming audio or streaming video. Or such input data might be other than streaming audio or streaming video, e.g., real-time streaming text. Examples of the latter type of input data include real-time electronic stock tickers published by financial websites such as Yahoo! Finance, CNBC, Bloomberg, or NASDAQ and real-time content streams published by websites such as Twitter and Facebook which leverage interest and/or social graphs.
As the sources of streaming data proliferate, scalability has become an issue for streaming applications that process such data and the platforms which run the streaming applications. Outside of the area of streaming applications, scalability has been addressed by distributed batch-processing platforms based on the Map-Reduce or similar frameworks. However, these platforms typically operate on input data originating in persistent storage, e.g., the persistent storage of the commodity servers that make up a Hadoop cluster. That is to say, in terms of a stock-and-flow model, these platforms operate on a stock rather than a flow (or stream).
Performance is also an issue for streaming applications and their platforms, since it is often desirable that a streaming application operate in real time or near real-time. In the past, streaming applications achieved real-time performance by sacrificing data integrity or data completeness. For distributed batch-processing platforms based on Map-Reduce and similar frameworks, real-time performance is often limited to accessing (e.g., using Pig, Scalding, Dremel, Drill, etc.) a store of indexed results that were generated offline.
Complicating matters still further, streaming applications tend to be non-stop, almost by definition. And consequently, fault tolerance is an important issue for streaming applications and the platforms on which they run.