Many new applications process multiple data streams simultaneously. For instance, in a sensor network, data flows from a large number of embedded sensors; and in the stock market, each security corresponds to a stream of quotes and trades. In comparison to these unbounded, high speed incoming data, applications that handle multiple streams are constrained by limited resources (e.g., CPU cycles, bandwidth, and memory).
To solve this problem, much previous work has focused on allocating resources in a best-effort way so that performance degrades gracefully. Naturally, resource allocation can be formulated as an optimization problem. For instance, if the data characteristics from a sensor exhibit a predictable trend, then the precision constraints might be satisfied by transmitting only a fraction of the sensor data to the remote server.
Other approaches assume that a set of Quality-of-Service (QoS) specifications are available. A load shedding scheme derived from these specifications decides when and where to discard data, as well as how much data to discard, so that the system achieves the highest utility under the resource constraints.
However, a need has been recognized in connection with providing a more intelligent load shedding scheme for data mining tasks.