Database systems are used today by organizations as the primary repository of the most valuable information that they maintain. As the volume of data stored in these repositories has increased, protecting the security of the data that is stored therein has attained increasing importance. Furthermore, the responsible management of sensitive data is mandated through laws such as the Sarbanes-Oaxley Act, the United States Fair Information Practices Act, the European Union Privacy Directive and the Health Insurance Portability and Accountability Act (HIPAA).
One of the important components of the security infrastructure is an auditing system that can be used to aposteriori investigate potential security breaches related to a database system. These products monitor various operations such as user logins, queries, data updates and data definition language (DDL) statements to obtain an audit trail. The audit trail is analyzed offline either periodically or when needed to answer questions about access to schema objects such as: (1) failed login attempts, and (2) queries and corresponding users that accessed columns corresponding to PII (personal identifier information).
An important class of auditing is data auditing. A simple example of data auditing is auditing where the objective is to identify all queries and update statements that “accessed” a particular tuple, e.g., the PII of a specific individual. Such queries potentially reveal sensitive information.
It is not known whether any commercial database auditing systems actually support this functionality. However, it is known that single tuple auditing has been the subject of research. This research has proposed two fundamentally different semantic approaches which can be classified broadly as (data) instance dependent and (data) instance independent.
The instance independent approach has been shown to provide strong privacy guarantees. However, there are a limited range of query classes for which it can audit efficiently. Additionally, subsequent research has shown that the instance dependent approach suffers from severe privacy limitations.
It should be appreciated that real-world queries such as the transaction processing council ad-hoc (TPC-H) benchmark queries are often complex, using constructs like grouping, aggregation and correlated subqueries that can pose a risk to security. Thus, while it may be acceptable for an auditing system to consider a restricted class of audit expressions, an auditing system that considers a restricted class of queries is fundamentally incomplete. In addition, auditing systems without clearly defined privacy guarantees can encourage serious breaches of privacy. Consequently, the narrow application and privacy shortcomings of known approaches to single tuple data auditing significantly limit their real-world utility.