A database server stores data in one or more data containers, each container contains records, and the data within each record is organized into one or more fields. In a database system that stores data in a relational database, the data containers are referred to as tables, the records are referred to as rows, and the attributes are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the attributes are referred to as object attributes. Other database architectures may use other terminology.
The present invention is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
A row in a table maintained by a database server may contain confidential information about individuals. Access to such rows needs to be controlled to protect the confidential information. In fact, many countries impose laws that restrict access to confidential information. Examples of such information include census data or medical information. However, while information in a row about a particular individual maybe confidential, aggregate information derived from many such rows may not be confidential. For example, while the individual salaries of persons living in a zip code is confidential, the average salary of persons living in the zip code is not confidential information. Aggregate information of this type is very valuable and public access to it is important.
Many countries, especially Europe, impose strong privacy requirements on confidential demographic data (e.g. census data). Publicly exposing a database that stores such data for public analysis while protecting confidentiality in order to conform with privacy laws is a very challenging task. For example, in many census bureaus around the world, there have been attempts to develop systems that permit users to run only queries that request aggregated information that do not return data that can be identified with a particular individual. Unfortunately, these specialized systems have been very expensive to develop and evolve in response to changing user needs and laws and regulations of many countries.
One approach that has been attempted to protect databases that hold confidential information while allowing public access to aggregate information is to allow users to access data by running a query that may be selected from a library of queries. No user specific query is allowed. This is the approach used by most census bureaus today. The biggest disadvantage of this approach is that too often information needed by a user cannot be retrieved or derived from one of the queries in the library.
Another alternative approach is to pre-build summary data (e.g. materialized views) that include information aggregated in all the ways needed by users. The overhead of this approach is onerous because the summary data that needs to be computed in order to meet needs of all users who access the summary data is enormous.
Another approach is to allow users to request information through a user interface which limits the type of information requested by users. The user interface allows a user to specify criteria by which to return aggregated information. For example, a user could request the average salaries of individuals that live in a particular area and that fall within a particular range. The user interface contains user controls that allows the user to specify a region and an age range. The user interface would not contain controls that allow a user to specify a particular street or address.
This approach has several disadvantages. First, it is often overly protective of confidentiality. The user interfaces do not provide the ability to specify criteria for attributes or classifications that can potentially be used to return information about specific individuals. For example, a street may have hundreds of individuals while another street may have one. The user interface does not allow a user to make a request that can specify a particular street because of the possibility that the returned information may be limited to the street with one or a few individuals.
Another disadvantage of this approach is that it limits user access to information to that which can be obtained through the user interface. Under this approach, users cannot access the database more directly, and do not have the kind access needed to use powerful database use tools.
Another disadvantage of approaches mentioned above is that they do not prevent access to confidential information in a database by users that have the ability to directly access the database without having to go through, for example, a user interface or API (“Application Programmers Interface”).
Clearly, there is a need to find a mechanism that protects confidentially of data that avoids the disadvantages attendant to the approaches discussed above.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.