Computer databases are a common mechanism for storing vast amounts of information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. For example, a customer database may have a record for each customer. Each record contains fields designating specifics about the customer, such as first name, last name, date of birth, home address, credit information, credit card information, social security number, customer history, and the like. As can be seen, much of this information is of a sensitive nature and thus databases are often maintained in an encrypted form.
A database management system (DBMS) typically provides and manages access to the database (i.e., the data actually stored on a storage device) for the users of the system. In essence, the DBMS shields the database user from knowing or even caring about underlying details involved in management of the database. Typically, all requests from users for access to the data are processed by the DBMS. For example, records can be searched, information may be added or removed from data files, information retrieved from, or updated in, such files, and so forth, all without user knowledge of underlying system implementation.
In this manner, the DBMS provides users with a conceptual view of the database that is removed from the implementation level. The general construction and operation of a database management system is known in the art. See e.g., Date, C., “An Introduction to Database Systems, Volume I and II,” Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
For enhancing the speed in which the DBMS searches, stores, retrieves, and presents particular data records, the DBMS usually maintains database indexes on one or more database tables. Database tables may include the complete contents of a database, but usually the DBMS creates one or more tables consisting only of those records that are most frequently accessed. Take for example FIG. 2 illustrating a table 30 having N records arranged in columns of fields which are row id 32, first name 34, last name 36, and date of birth (DOB) 38. One can readily appreciate that within a customer database, this type of information may be frequently accessed and searched.
Although the data in a database table may be of a sensitive nature, to enable fast searching such database tables are not typically encrypted. To search an encrypted table, the DBMS would first have to expend computing power decrypting the table, then perform the search and finally encrypt (again) the table. Hence, encrypting the database table prohibits fast searching as currently there is no mechanism for searching encrypted data.
A database index, typically maintained as a B-Tree (or B+Tree) data structure, allows the records of a non-sorted database table to be organized via nodes of the database index in many different ways, depending on a particular user's needs. A database index may be constructed as a single file storing ordered nodes of index key values together with unique index data. The index key values are a data quantity composed of one or more fields from a record which are used to arrange (logically) the database table records in some desired order (index expression). The index data are unique pointers or identifiers to the actual storage location of each record in the database table. Both are referred to internally by the system for locating and displaying records in a database table. Like the tables, for ease of searching and data management, current schemes require that the data within the database index be maintained in an unsecured transparent form.
FIG. 3 illustrates one type of b-tree index, a binary tree index 50, according the prior art. The binary tree index 50 includes a plurality of nodes 52 arranged in a hierarchical order. In the example of FIG. 3, each node 52 includes a transparent index key 54 and a transparent index data 56. The transparent index key 54 contains data that is used for hierarchical ordering of the index 50. In the case of the table 30, e.g., data from the DOB column 38 would be well suited for use as the transparent index key 54, the index 50 arranged according to age. The transparent index data 56 contains the payload data of the node 52. In the case of the table 30, e.g., data from the row id column 32 would be well suited for use as the contents of the transparent index data 56.
Searching for a particular record in a B-Tree index occurs by traversing a particular path in the tree. To find a record with a particular key value, one would maneuver through the tree comparing key values stored at each node visited with the key value sought. The results of each comparison operation, in conjunction with the pointers stored with each node, indicate which path to take through the tree to reach the record ultimately desired. Ultimately, a search will end at a particular leaf node which will, in turn, point to (i.e., will store a pointer to or identifier for) a particular data record for the key value sought. Alternatively, the leaf nodes may for “clustered indexes” store the actual data of the data records on the leaf nodes themselves.
An index allows a database server to find and retrieve specific rows much faster than it could without using the index. A sequential or linear scan from the beginning of a database table, comparing each record along the way, is exceedingly slow compared to using an index. There, all of the blocks of records would have to be visited until the record sought is finally located. For a table of even moderate size, such an approach yields unacceptable performance. As a result, virtually all modern-day relational database systems employ B-Tree indexes or a variant.
FIG. 1 illustrates a method 10 for generating and searching a table index utilizing transparent data according to the prior art. An initial step 12 populates a transparent database with a variety of data, including sensitive and possibly non-sensitive data. A next step 14 generates one or more transparent tables consisting of commonly searched data extracted from the transparent database. See FIG. 2 and the preceding description for more details on tables.
Continuing on with the method 10 of FIG. 1, a next step 14 generates a transparent index suitable for searching a transparent table. Once indexes have been formed, a step 18 provides for fast searching of the transparent tables utilizing the indexes formed. Then a step 20 encrypts the database for permanent storage.
The methods involving database tables and indexes, like that of FIG. 1, provide for fast table searching of data extracted from a database. Although indexes are widely used to improve DBMS performance, they expose sensitive data to attack. This is because searching methods of the prior art require that the database tables and database indexes be maintained in a transparent format in order to facilitate fast searching. Encrypting the indexes and tables would secure such data, but the computing power required to encrypt and decrypt these tables and indexes whole scale each time a search is required renders such a method unfeasible.