1. Technical Field
The present invention relates to protecting sensitive data in stream processing service environments (SPSEs), and more particularly to systems and methods for keeping sensitive data in a distributed job from leaving a secure host, and for protecting inquiries that may be mined for sensitive or revealing data.
2. Description of the Related Art
Systems for processing streams of data utilize continuous streams of data as inputs, process these data in accordance with prescribed processes and produce ongoing results. Examples of stream processing systems include System S from IBM®; StreamBase™ from StreamBase Systems™, Inc.; and Borealis™ from MIT and Brown University. In such a system, applications are composed of independent processing elements that operate on streams of data objects by filtering, combining, transforming, and otherwise analyzing the data. These operations can take the form of database operations such as merging streams and selecting tuples that match specific criteria, or they can be more general application-specific logic.
A stream processing system can be owned, operated, and used for the benefit of a single entity, such as a corporation or government organization. It can also be owned and operated as a service, in which one organization operates the system for the benefit of other organizations that pay for the use of the stream processing system.
In a stream processing service environment, such as System S, data streams from one processing element to another in near real time. It is imperative to protect sensitive data, which may exist in an inquiry, a processing element, the streaming data itself, or the results. If the stream processing is handled solely in a single infrastructure (provider), there are many methods of protecting sensitive data fairly easily and securely, including using security labels. However, in a mixed infrastructure, i.e., one that uses stream processing components (processing elements, job planners, data sources) across multiple hosts or providers, the data becomes much more difficult to protect.
To protect sensitive data, there are several points at which the data must be secured. First, there is a challenge in keeping highly sensitive data in a distributed stream processing job from leaving the secure hosts in the first place. Secondly, there exists a challenge in protecting sensitive data from being revealed by the inquiries themselves.