Many businesses and government organizations face the need to collect, store, and process personally identifiable information (“PII”) such as personal information associated with employees, customers, or in the case of a government, their citizens. Privacy protection laws and common business practices require these organizations to develop and adhere to a privacy policy that governs the use of PII. In particular, a privacy policy establishes the purposes for which personal information can be used within the organization, and under which conditions it may be accessed by the organization's employees or by other organizations.
Furthermore, many businesses and organizations use information processing systems that can be modeled as networks of interconnected processing elements. In general, a network of processing elements accepts certain entities through input channels, which are referred to as primal sources within the network. Processing elements may accept entities via one or more input channels, and may modify received entities or produce new entities and release one or more entities via one or more output channels. Exemplary processing elements may include businesses, manned workstations, factory machinery, software programs, agents, services, components, and the like. Exemplary primal entities may include, but are not limited to, business documents, machine parts, news feeds, data obtained from computer networks, and the like. The entities may include private information such as employee information, trade secrets, other confidential information, and the like. Therefore, disclosure of private information is a concern when using networks of processing elements.
FIG. 1 represents a typical processing graph 100 illustrating the interconnection between processing elements in a network of processing elements. Entities enter the network system through primal sources 102, 104, and 106.
The input entities, which can include private information, can be documents that come from various sources, including databases, archives, or sensory inputs. Entities produced by processing elements within the network can also be used as input data for other elements. The entities can then be processed by one of processing elements such as processing elements PE A 108, PE B 110, and PE C 112. The entities can also be directly presented to other parties through an output channel 114 and 116. Entities that were processed by one of the processing elements 108, 110, or 112, can be similarly processed again by other processing elements, or submitted to one of the output channels 114 and 116. At any point in time, the data can be stored within the network of processing elements.
Although information processing systems based on networks of processing elements are very useful for processing data, privacy risks exist when private information is being used in a workflow. Users, organizations, and the components of the business process itself are being trusted with private information. All of the participants of the business process are usually required to adhere to a privacy policy, non-disclosure agreements, and the like. However, the participants often have the potential to violate these policies and agreements regulating the use of PII. If an information processing system does not implement privacy control, unauthorized access to private information can occur. For example, a user requesting a specific data product to be produced by the information processing system may not be authorized to view the resulting data. Also, one or more of the processing elements may not be authorized to accept specific data as an input.
Existing compositional systems based on networks of processing elements use planning techniques to mitigate and manage security risks. Planning techniques allow for automatically creating workflows of processing elements according to product requirements. Planning techniques are useful in applications related to semantic web, web services, workflow composition, and component-based software where manual analysis is inefficient. However, current compositional systems implementing planning techniques do not consider privacy control in the planning. Examples of planning systems are further described in A. Keller, “The CHAMPS System: A Schedule-optimized Change Manager”, USENIX'04 Ann. Technical Conf., June 2004; J. Blythe, et al., “The Role of Planning in Grid Computing”, ICAPS 2003; P. Doshi, et al., “Dynamic Workflow Composition using Markov Decision Processes”, Proceedings of IEEE Second International Conference on Web Services, June, 2004; and B. Srivastava “A Decision-support Framework for Component Reuse and Maintenance in Software Project Management”, CSMR'04, which are hereby incorporated herein by reference in their entirety.
Although there are similarities between information security and privacy protection, privacy risks are different from security risks. While security is mainly concerned with access control at a coarse granularity of data, privacy controls are more fine-grain. For example, security access control policies used for securing information flows, Such as Mandatory Access Control (“MAC”), Multi-Level Secure systems (“MLS”), and Role-Based Access Control (“RBAC”), typically evaluate the risk of large pieces of information, such as entire documents or a database table. In many instances, a security access control policy allows certain privacy-sensitive data, such as level of income or medical history to be published and used for research. In other words, planning systems that mitigate security risks do not take privacy risks into consideration. A few examples of security access control models are further described in the following references: D. Bell, et al., “Computer security model: Unified exposition and Multics interpretation”, Technical Report ESD-TR-75-306, The MITRE Corporation, Bedford, Mass., HQ Electronic Systems Division, Hanscom AFB, MA, June 1975 and D. Ferraiolo, et al., “Role Based Access Control”, Proceedings of the 15-th NIST-NSA National Computer Security Conference, Baltimore, Md., 13-16 Oct. 1992, which are hereby incorporated herein by reference in their entirety.
Further description on workflow security can be found in the following references E. Bertino, et al., “An XML-Based Approach to Document Flow Verification”, In Proc. of the 7-th International Information Security Conference (ISC 2004), Palo Alto, Calif., USA, Sep. 27-29, 2004, Lecture Notes in Computer Science, Volume 3225, 2004, pp. 207-218; R. Botha, et al., “Separation of duties for access control enforcement in workflow environments”, IBM Systems Journal, Volume 40, Issue 3 (March 2001), Pages: 666-682; R. Botha, et al., “A framework for access control in workflow systems”, Information Management and Computer Security 9 (3), 2001, and the commonly owned U.S. patent application Ser. No. 11/328,589, filed Jan. 10, 2006, entitled “Method of Managing and Mitigating Security Risks Through Planning”, which are hereby incorporated herein by reference in their entirety.
In contrast, privacy protection policies are focused on disclosure risks associated with releasing personally identifiable information. Privacy protection policies may restrict access to certain records within a database table, or certain fields in a document. For example, a privacy protection policy may state that personal information about minors should not be accessed for a given purpose. Further, privacy protection policies may place restrictions on filtering and combining data. For example, combining bank account number with social security number within one document can generate a high privacy risk.
Current workflow systems do not include an automatic mechanism for preserving privacy. Typically, human experts are used to ensure that privacy risks do not exceed acceptable levels. However, in large workflow systems, using a human to compose the workflows and manage privacy risks is very difficult and inefficient. In addition to privacy concerns, other criteria, such as output quality and resource utilization must be considered in workflow composition, which makes the composition even more difficult.
Composing workflows is a labor-intensive task, which requires that the person building the workflow has an extensive knowledge of component functionality and compatibility. In many cases this makes it necessary for end-users of these systems to contact system or component developers each time a new output information stream is requested, and a new configuration is needed. This process is costly, error-prone, and time-consuming.
Additionally, in large practical systems both changes in the input supplied to the system and changes in the system configuration (availability of processing units, primal streams, and the like) can invalidate deployed and running workflows. With time, these applications can start producing output that no longer satisfies output requirements. In particular, the original estimate of privacy risk can become invalid. Timely reconfiguration of workflows to account for these changes is extremely hard to achieve if the workflow composition requires human involvement.