Sensitive Data
Computers are used to store and manage many types of data. Sensitive data is a common form of data that computers are used to manage. Sensitive data refers broadly to any data that represents non-public information that might adversely affect the privacy or security of a person or organization if revealed to persons who should not be trusted with the information. An information access policy is a statement of the conditions under which a particular user may access sensitive data. For example, a business may declare that an employee can only access project information about the projects he or she is assigned to.
Database Systems
Database management systems often implement information access policies by filtering data returned to a user in response to the user's request (e.g., query) for sensitive information from the database. Increasingly, businesses and organizations are storing sensitive data in databases managed by database management systems. Database management systems are often used to enforce information access polices because they are typically deployed as a software-based or hardware-based intermediary between the users of the system seeking access to sensitive data and the actual physical database itself (i.e., the sensitive data stored on a storage device). As an intermediary between users and sensitive data, database systems are suited to enforce mandatory access control.
When used to enforce information access policy on sensitive data, database management systems are typically deployed in either a two-tier client/server environment or in a three-tier client/server architecture. In a two-tier client/server environment, a client process receives a query from a user and connects directly to a database server process of the database management system. In such two-tier architectures, the database server process is capable of executing the user's query directly against the database. In a three-tier client/server architecture, the client process is indirectly connected to the database server process through an application process. In such three-tier architectures, the application process submits database queries to the database server process on the user's behalf.
In either case, whether the client process is connected directly to the database server process or connected indirectly through an application process, the database server process typically establishes database session data (“session data”) that identifies the user of the system. Typically, the identity of the user is established when the user is authenticated. For example, the database server process may authenticate a user by comparing a username and password received from the user against a list of known usernames and passwords stored in the database.
Session data typically identifies a user of the system individually or by one or more roles or groups to which the user is associated. For example, session data may comprise a unique user identifier. In addition to or instead of a unique user identifier, session data may indicate one or more roles or groups assigned to the user such as, for example, “employee”, “shareholder”, or “vice-president”. The database server process may store the session data in a computer memory for a period of time such as, for example, until the client process or application process disconnects from the database server process. When a query is executed, the database server process may use the stored session data to identify the user making the query. Having established the identity of the user making the query, the database server process may use that established identity to enforce information access policies that depend on the identity or role of the user.
Possible Approaches in Database Systems for Enforcing Information Access Polices
One possible approach for a database management system to enforce an information access policy is to perform query rewriting. Generally, query rewriting is employed by a database system to achieve one or both of two objectives. One objective is to rewrite a user's query so that the rewritten query, when executed, returns the same result set as it would if it had not been rewritten but does so in a more efficient manner. A second objective is to rewrite a user's query so that the rewritten query, when executed, potentially returns a different result set than that originally intended by the querier. Query rewriting as described herein is generally refers to query rewriting performed to achieve the second objective.
Query rewriting is the process of intercepting and rewriting a user query so that the query, when executed against sensitive data, modifies/returns only the sensitive data that the user is permitted to access according to the information access policy. Typically, the query rewriting process limits access to sensitive data by adding additional query predicates to the user's query. For example, consider a relational database that stores employee records as rows in an ‘Employee’ table. The table may have ‘employee_id’ and ‘salary’ columns. A user may submit the Structured Query Language (SQL) query select * from Employee to retrieve all employee records. To enforce an information access policy specifying that employees may view their records only, a database management system may add the query predicate where employee_id=<user's_employee_id> to the user's query before the query is executed against the database so that the user obtains access to her employee record only. Similarly, the <user's_employee_id> may also be appended on an update request to restrict access to personal information, for example: update Employee set HOME_PHONE=‘555-123-4567’. The database management system may use session data to derive the value of <user's_employee_id> at the time the user's query is executed.
Often, the implementation of an information access policy in a database system is tightly coupled to the structure of the sensitive data stored in the database. For example, if sensitive data is stored relationally, then information access policies are typically implemented in terms of the tables and columns defined for the database. One problem with such tight coupling is that if there is a change to the metadata (e.g., table and column definitions) that defines the structure of sensitive data, then the implementation of the policy associated with the structure may no longer be effective. Thus, changing the structure of sensitive data may cause violations of information access policy if the implementation of the policy is not also changed when the structure is changed.
Another problem with typical approaches for implementing information access policies in database systems is that information access policies are often implemented as policy functions which are hand-coded by a security administrator and cannot easily change when there is a change to information access policy. A policy function is typically associated with a database structure and is invoked when a query attempts to access data contained in the structure. For example, a policy function may be associated with the table in a relational database and invoked when a query attempts to retrieve data from the table. Typically, the policy function, when invoked, returns any additional query predicates to be added to the user's query so that the user's query returns only data that complies with the information access policy. To determine the additional query predicates, the policy function may perform other tasks such as querying other tables in the database and mapping session data to the generated query predicates. Such policy functions can easily become quite complex with many conditional programming statements such as if-then-else statements including nested conditional programming statements. As such, complex policy functions can make it difficult for businesses and organizations to determine and demonstrate compliance with information governance regulations such as, for example, Sarbanes-Oxley and the Health Insurance Portability and Accountability Act (HIPPA). Further, such complex policy functions cannot easily be updated when information access policy is revised.
What is needed is an alternative implementation mechanism of information access polices for sensitive data in data processing systems. Ideally, the solution should allow the implementation of information access policies that are more loosely coupled to the storage structure of sensitive data and more easily facilitate the expression of complex information access polices, as compared with the usual approaches for implementing information access policies in data processing systems. These and other needs are addressed by the invention described herein.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.