Database management systems (DBMS's) are systems designed to store and manage data. DBMS's may receive data to be stored and may allow updating or deleting previously stored data. Of the resulting stored data, DBMS's provide functionality to retrieve specific data relevant to a particular purpose. Many possible types of data may be stored at a DBMS, and the data may be structured in many possible ways. Notably, the quantity of data located at a DBMS may be large. Databases containing gigabytes of data are common, and databases containing terabytes of data are known in the art. Conversely, using a DBMS even for comparatively small quantities of data may be advantageous because the functionality provided by the DBMS may be applied to that data.
DBMS's beneficially allow a wide variety of organizations to effectively and accurately manage their data. The use of DBMS's by businesses is particularly widespread. Many business goals are advanced by effectively managing data, a task which DBMS's are specialized to achieve. For example, a business may increase profitability by maintaining accurate information about customers in a DBMS. DBMS's also benefit organizations which do not have a profit motive, such as educational institutions, research facilities and government agencies. Furthermore, computer program products, computing systems and other technological processes are frequently coupled to a DBMS in order to manage relevant data. This coupling is particularly widespread in Internet-based applications because environments known in the art for executing such applications typically offer limited capabilities for maintaining state information and persistent data.
To retrieve data located at a DBMS, a system may submit a query to the DBMS. A query is a request to transmit specific data located at a DBMS to the system making the request. A query may be submitted to a DBMS in order to retrieve data relevant to a specific task. Querying is important because only a fraction of the total data located at a DBMS is relevant to most tasks; querying allows retrieving this fraction relatively quickly. A querying operation may comprise selecting, from the entire body of stored data, only that subset of the data which fulfills specific criteria. Specifically, data matching the criteria are selected and data not matching the criteria are not selected. A wide variety of criteria may be used in queries. One of the most common criteria is that a specific element within the data must contain a specific value. Another possible criterion is that a specific element must contain a value falling within a specified range. A range criterion may include an upper bound, a lower bound, or both. Additionally, Boolean logic may be used to combine multiple criteria so that the query returns only a subset of the data for which the Boolean expression evaluates to True.
Queries which select data matching one or more logical condition predicates are known in the art as identity queries. It is noted that not all queries are identity queries. For example, many database systems known in the art allow selecting all records for which a certain element matches the result of a validating query, known as a subquery. Such a query, although it may be accepted by a specific DBMS, is not an identity query. Those skilled in the art will appreciate that while such complex queries may be useful in certain cases, they typically take a longer time to execute and require more computing resources than identity queries. Furthermore, queries which are not identity queries can frequently be rewritten as one or more identity queries which retrieve the same data.
The data stored at a DBMS does not need to be static. In fact, an advantageous feature of DBMS's is that data not only may change, but generally may do so in real time. To allow data to be dynamic, most DBMS's known in the art provide functionality to manipulate the data stored at a DBMS via inserting, updating and deleting operations. An inserting operation causes new data to be stored at the DBMS. An updating operation modifies existing data stored at the DBMS such that existing values are replaced by updated values. A deleting operation removes data from the DBMS which is presently stored there but is no longer desired.
One type of database commonly used in the art is the relational database, also known as an RDBMS. In a relational database, a table stores data having a common structure and representing a similar type of entity. Specifically, a table contains units of data known in the art as tuples. It is noted that tuples are alternatively known as records or rows; all three terms have an identical meaning. The number of tuples stored in a table may at one extreme be very large, and may at the other extreme be zero. A tuple frequently contains a coherent, atomic unit of data, often corresponding to a single entity. However, those skilled in the art will appreciate that many exceptions to this broad guideline exist. Each tuple contains one or more fields, each of which is configured to contain data of a specified type. Common types of fields include integers, real numbers (often having a defined number of digits to the right of a decimal point), text (often subject to a maximum number of characters) and Booleans (values which may be either logically True or False.) A field may be configured to allow a special value called Null which indicates the non-existence of a value for that field. Generally, all tuples within a table will include the same fields, although the values of the fields generally vary from tuple to tuple. It is emphasized, however, that not all DBMS's follow this relational paradigm. Other types of DBMS's known in the art include object-oriented databases and hierarchical databases.
Relational databases may be queried using a specialized programming language called Structured Query Language, or SQL. It is noted that other querying languages exist in the art. Furthermore, even among those DBMS's known in the art which accept SQL queries, noticeable differences may exist in SQL syntax from DBMS to DBMS.
DBMS's known in the art generally require sophisticated hardware and software. Furthermore, effective administration of a DBMS generally requires a high degree of expertise. Many organizations which may benefit from the data management functionality of DBMS's lack these resources.
Database outsourcing can help bridge this gap. Database outsourcing is the contracting of an organization's database management tasks to an outside database service provider. This beneficially allows organizations to realize the benefits of DBMS's while decreasing the need for in-house expertise, hardware and software. Database outsourcing is therefore beneficial for organizations having limited capabilities for managing their own data. Even when an organization possesses database management expertise, database outsourcing confers many other benefits. In particular, database outsourcing may reduce costs. Database outsourcing may also help organizations to focus on their core tasks.
Database outsourcing is becoming more feasible from a cost standpoint. Historically, transmitting data over wide distances has been expensive. This fact encouraged locating DBMS's in close physical proximity to their users, thus discouraging database outsourcing. However, during a recent five-year period, the cost to transmit a quantity of data over a large geographic area decreased by approximately 75 percent. As a result, the costs of database outsourcing have fallen while the benefits are as advantageous as ever.
For database outsourcing to succeed, organizations must be ensured of the integrity of queries performed against the outsourced database. Data authenticity—the fact that the data returned in response to a query is the same data that was transmitted to the database—must be guaranteed. Query completeness—the fact that all records which should be matched by a query are in fact returned—is critical. Database outsourcing presents other challenges as well. The privacy of data must be ensured. Ideally, even the outside database service provider itself should have no access to the plaintext (unencrypted) version of the data stored therewith. Performance, scalability and ease of use, which have traditionally been important issues in DBMS's, have now gained a new dimension in the database outsourcing paradigm.
Furthermore, it is important to ensure query integrity without incurring unduly high costs. For example, some techniques in the prior art for ensuring query integrity involve computationally intensive security schemes. Because a database query typically requires only a fraction of a second to execute, such techniques may add significant overhead. Other prior art techniques involve storing data at a client to assist in verifying query integrity. However, such techniques inherently require data management capability at the client side. For many clients, such as Personal Digital Assistants (PDA's), mobile phones and other thin clients, local data management may not be possible due to storage limitations. More generally, local data storage is precisely what database outsourcing seeks to minimize. Therefore, the utility of this category of prior art techniques is inherently limited.
Database outsourcing is facilitated by encryption. Encryption is a set of techniques known in the art for modifying data so that it is difficult to determine the unmodified data even if an entity (such as an outside database service provider) has access to the modified data. The original, unmodified data input to an encryption algorithm is known in the art as plaintext.
Notably, encryption does not attempt to make it impossible to determine the content of plaintext based solely on its encrypted form. In fact, all encryption techniques known in the art can be defeated if unlimited computing resources are available. Instead, encryption attempts to make it computationally infeasible to determine the content of plaintext based solely on its encrypted form. This means that the amount of resources required to determine the plaintext data from its encrypted form exceeds the value of the plaintext data. For example, suppose that the maximum potential loss resulting from unauthorized access to a given plaintext data set is $10,000. Suppose that the plaintext data are encrypted in a manner such that the expected value of the quantity of computing power required to reverse the encryption without authorization is 1,000 processor years. Suppose also that the cost of 1,000 processor years of computing power is $500,000. It is computationally infeasible to reverse the encryption in this case because even if an entity is unethical, a business case cannot be made for reversing the encryption without authorization.
Many encryption techniques known in the art employ a secret key to encrypt plaintext data. Specifically, plaintext and the secret key are input to an encryption function. The result of the encryption function is the encrypted form of the plaintext data. If an entity has access to the secret key, the entity may reverse the encryption by inputting the encrypted data and the secret key may be input to a decryption function. If an entity does not have access to the secret key, it is computationally infeasible for the entity to obtain the plaintext data based on the encrypted data.
Another notable type of encryption is one way hashing. One way hashing may be achieved by employing a one way hash function. A one way hash function may receive as input data having a variable length and may return as a result data having a fixed length. A one way hash function may also receive a secret key as input. If an entity does not have access to the secret key, it is computationally infeasible for the entity to determine whether a specific result was generated from specific input data. One way hash functions are determinate. Accordingly, for the same one way hash function, the same input value and the same key will always yield the same result value. For most one way hash functions, it is computationally infeasible to determine an input value for which the function will output a particular result value. For many one way hash functions, it is computationally infeasible to find two different input values for which the hash function returns the same result.