1. Technical Field
The present invention relates to stream processing and more particularly to a system and method for designing a secure and lightweight stream processing system (SPS).
2. Description of the Related Art
A stream processing system (SPS) normally includes a network(s) of thousands of processing units (PUs) and the packets/information units (IUs) that flow between them. A processing unit (PU) normally includes an input port, analytics and an output port. The input port acts as the receiver of IUs from upstream PU(s) and the output port acts as a sender of IUs to downstream PU(s). The PU typically processes the input IU and extracts some additional information or attributes from the IU through its analytics. Analytics is a generic term for algorithms, transformation techniques or logical operations employed by the PU to process the IU before forwarding it downstream.
An IU typically carries two fields of information: payload and derived information. Payload is the basic information generated from the source of the stream, which may include segments of images, video, audio, speech transcript files, etc. Payload can be in binary and/or coded format. Derived information is meta-data derived from the payload of IUs. Derived information can be added before the IU enters the SPS through automated pre-processing or through manual techniques etc.
Alternatively, the derived information field may be progressively enriched by the PUs as the IU traverses through the SPS. The derived information mainly includes tuples of the form <attribute name, attribute value>.
Though the IU carrying the derived information and payload flows through the SFS, not all the PUs encountered by the IU in its path are interested in each and every piece of information carried by the IU. Some PUs may be interested in processing only the derived information while others are interested in only the payload and do not care about the derived information carried by the IU. Among the PUs interested in only the derived information, different PUs may access different subsets of the derived information. This may be because either the PUs do not need information carried by other fields of the derived information, or they do not have permission to access other fields due to security and privacy considerations.
A naïve solution is to have access control labels to different fields of information carried by an IU. Thus when the IU arrives at a PU, security and privacy (S&P) checks are done locally at the PU and only those fields of the IU to which the PU has access rights are disclosed to the PU. The PU then processes the IU and may add more derived information to the IU.
The disadvantages of such a solution include: 1) The PU is more vulnerable to hacking as a single compromised PU can negatively affect the performance of all the downstream PUs; 2) Having S&P labels for different fields inflates the size of the IU and thus can contribute to congestion in the SPS; and 3) Since each PU may further add more derived information to the IU, the size of IU progressively expands along its path in the SPS further contributing to congestion in the resource constrained environment in which most SPS typically operate.
Referring to FIG. 1, such a naïve solution is illustratively depicted in which security and privacy (S&P) labels 106 are embedded in IUs 103, 104 and 105. The S&P labels provide information for access controls to the derived information and payload or to portions thereof. S&P checks are performed at each PU 101 and 102 which the IU encounters in its path 100 in the SPS 110. The SPS 110 includes PU 101 and PU 102 and flow path 100.
The IU 103 initially includes S&P labels 106, derived information including a single <attribute name, attribute value> tuple 107 and payload 108. When IU 103 reaches PU 101, PU 101 does S&P checks, processes the information fields (in this case either the derived information or payload or both) to which it has access, and writes a new tuple 107a in the derived information field. The modified IU 104 leaving PU 101 is greater in size than IU 103. Similarly, PU 102, after processing PU 101, adds more meta-data 107b. The size of original IU 103 gets progressively expanded as it leaves PU 101 and PU 102 to become modified IU 105.