The data stream model has become a popular abstraction when designing algorithms that process massive data sets (for example, communication network traffic). The computational restrictions that define the data stream model are severe; algorithms must use a relatively small amount of working memory and process input in whatever order it arrives. This captures constraints in high-throughput data processing settings. For example, network monitoring often requires real-time (or near-real-time) response to anomalies. Thus, traffic data needs to be processed as soon as it arrives, rather than be stored and processed offline at a later time. For massive data sets stored in external memory, being able to process the data in any order avoids the I/O bottlenecks that arise with algorithms that assume random access. Unfortunately, while some problems admit efficient streaming algorithms, many others require a relatively large working memory—or multiple passes over the data—both of which are not feasible in most situations.
When dealing with massive quantities of data, a data owner would often like to “outsource” the operations associated with processing the data to generate computational solutions. For example, the data might consist of two very large database relations, and the desired computation consists of performing a “join” operation between the two. Computing this join can be costly, so it is desirable to engage a more powerful third party to perform this task. However, the data owner would also like to be assured that the result is correct.
In another environment, there are instances where a large number of co-processors or multiple cores are being used directly by a data owner to process large quantities of data. Malfunctions of the hardware, software, or a combination of both, may lead to situations where a data owner cannot trust the reliability of the data computation.
A need exists, therefore, for a system of providing assurance to a data owner that data stream computations have been properly performed and have generated accurate results.