The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, whether or not a given approach is prior art, the problems identified with that approach should not be assumed to have been recognized in the prior art.
A database server stores data in one or more data containers. Each container contains records. The data within each record is organized into one or more fields. In a database system that stores data in a relational database, the data containers are referred to as tables, the records are referred to as rows, and the attributes are referred to as columns. In object oriented databases, the data containers are referred to as object classes, the records are referred to as objects, and the attributes are referred to as object attributes. Other database architectures may use other terminology.
The description herein is not limited to any particular type of data container or database architecture. However, for the purpose of explanation, the examples and the terminology used herein shall be that typically associated with relational databases. Thus, the terms “table”, “row” and “column” shall be used herein to refer respectively to the data container, record, and field.
A database server retrieves and manipulates data in response to receiving a database statement. Typically the database statement conforms to a database language, such as Structured Query Language (SQL). A database statement can specify a query operation, a data manipulation operation, or a combination thereof. A database statement that specifies a query operation is referred to herein as a query. The discussion herein is not limited to database statements that specify a particular type of operation. However, for the purpose of explanation, examples discussed herein are illustrated using queries.
One function of a database server is to control access to database data. Security mechanisms on database servers control what data may be accessed by a query issued by a user. One type of security mechanism is referred as a fine-grained access control mechanism. An example of fine-grained access control is described in U.S. Pat. No. 6,487,552, issued Nov. 26, 2002 to Chon Hei Lei et al, which is incorporated herein by reference in its entirety. Fine-grained access control may be used to grant and/or deny access to one or more rows of a table.
Legal and industry compliance requirements are forcing companies to mask or redact security sensitive data that are displayed to both authorized and unauthorized users on production systems. A common example of this masking can be seen on a credit card receipt where all but the last four digits of the credit card number are replaced with the X character.
Data redaction solutions mask out data dynamically and/or persistently. Dynamic data masking techniques redact data in the application level, on the network, or through views and triggers in the database. Persistent data masking techniques create a redacted copy of the data that may reside on the same database as the original data or may be exported to a separate test or development database where none of the original data exists. These approaches have several drawbacks.
Persistent masking techniques require modification of the original data and/or creation of new persistent database objects. Any masking technique that directly modifies the original data cannot be used on production databases where the integrity of the original data must be maintained. Some have attempted to work around this problem by creating shadow tables (i.e., duplicate tables) that store the redacted versions of the original data. This forces applications to modify their queries to select from the shadow tables. Doing so incurs additional time and development expense. Furthermore, each application needs to implement its own logic to decide whether to query from a redacted table or the original table
Techniques that redact at the application level can produce inconsistent redaction behavior across different applications. Companies use a variety of applications provided by multiple vendors with multiple development teams. As a result, the data that are redacted and the method used to redact the data can vary greatly across applications. In some cases, the results may not fully address customer compliance needs because the application failed to mask or redact data in a manner required for a given industry or region.
Some applications may be designed without masking or redaction capabilities. Many applications designed before current data privacy compliance requirements do not incorporate data masking or redaction capabilities for all required data. The cost to modify these applications to support masking can be prohibitive as it may involve significant changes to the application architecture.
Persistently redacted data and data redaction performed through views and triggers in the database may produce incorrect relational processing. When data is redacted too early, basic relational operations like JOINs will operate on data after it has been masked or redacted. This happens when redaction is implemented through triggers or if data is persistently redacted. In permanently redacted databases, redacted values are required to have a one-to-one mapping with the original values so that data relationships can be preserved. This approach is not only slow and difficult to implement in a reliable way, but it also creates redacted data that is reversible, which is an undesirable property in data redaction in a production environment.