Technical Field
The present invention relates to protecting sensitive data in stream processing service environments (SPSEs), and more particularly to systems and methods for managing encryption keys and for stream data encryption.
Description of the Related Art
Systems for processing streams of data utilize continuous streams of data as inputs, process these data in accordance with prescribed processes and produce ongoing results. Examples of stream processing systems include System S from IBM®; StreamBase™ from StreamBase Systems™, Inc.; and Borealis™ from MIT and Brown University. In such a system, applications are composed of independent processing elements that operate on streams of data objects by filtering, combining, transforming, and otherwise analyzing the data. These operations can take the form of database operations such as merging streams and selecting tuples that match specific criteria, or they can be more general application-specific logic.
A stream processing system can be owned, operated, and used for the benefit of a single entity, such as a corporation or government organization. It can also be owned and operated as a service, in which one organization operates the system for the benefit of other organizations that pay for the use of the stream processing system.
A key characteristic of a stream processing service environment (SPSE) is the existence of data and/or processing belonging to multiple organizations. In a stream processing service environment, such as System S, data is streaming from one processing element to another in near real time. It is imperative to protect sensitive data, which may exist in the inquiry, the processing element, the streaming data itself, or the results. If the stream processing is handled solely in a single infrastructure (provider), there are many methods of protecting sensitive data fairly easily and securely, including using security labels. However, in a mixed infrastructure, i.e., one that uses stream processing components (processing elements, job planners, data sources) across multiple hosts or providers, the data becomes much more difficult to protect.
One method to protect sensitive data as it is transferred within a remote, potentially insecure, host is to use encryption. However, in a mixed infrastructure environment, encryption is not as simple as creating a single key and encrypting/decrypting the data just once. Since the data may be processed by processing elements on one host, then transferred to processing elements on another host, there must be a way to intelligently encrypt the data and provide the decryption keys to the appropriate processing elements without allowing access to the decryption key by processing elements not involved in the processing of the job.