Computers are used today to store large amounts of data. Such information is often stored in information storage and retrieval systems referred to as databases. This information is stored to, and retrieved from, a database using an interface known as a database management system (DBMS).
One type of DBMS is called a Relational Database Management System (RDBMS). An RDBMS employs relational techniques to store and retrieve data.
Relational databases are organized into tables, wherein tables include both rows and columns, as is known in the art. Each row of a table may be referred to as a record. Each column defines a respective data item, or data value, that may be saved within each of the records. For example, a “business name” column may store data that identifies a business name that is associated with the record. The remaining columns may store additional data for the business identified within the business name column.
Many DBMS systems support the concept of a “transaction”, which is a single logical operation that accomplishes an indivisible action on the data within the database. Such DBMS systems enforce the transaction properties of Atomicity, Consistency, Isolation, and Durability, also referred to as the “ACID” properties. The following definitions for these terms are provided in Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter, Morgan Kaufmann Publishers, Inc., 1993, p. 6:
Atomicity. A transaction's changes to the state are atomic; either all happen or none happen. These changes include database changes, messages, and actions on transducers.
Consistency. A transaction is a correct transformation of the state. The actions, taken as a group, do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program.
Isolation. Even though transactions execute concurrently, it appears to each transaction, T, that other transactions execute either before T or after T, but not both.
Durability. Once a transaction completes successfully, (commits), the state of the data is changed to reflect the successful completion of the transaction, and the state of the data will survive system failures.
DBMS systems that enforce the ACID properties generally use a command language to transform a database from one state to a next. As an example, all DBMS systems that enforce ACID properties include a command to create a new entry within the database. This type of command, sometimes referred to as an INSERT statement, adds a new record to the database. Another similar type of command, sometimes referred to as an UPDATE, is used to modify a record that already exists. This type of command locates an existing record of interest within the database, and then modifies the contents of one or more of the record columns.
The types of commands described above may only be issued after a current state of the database has been ascertained. In other words, an entity such as an application program must determine whether the target record exists before that program will know which command to issue. This is necessary because if an INSERT statement is issued to add a target record to the database when that record already exists, an error will occur. Similarly, an error will occur if an UPDATE statement is used to modify a non-existing record.
In view of the foregoing, application programs generally perform a query to determine the state of a database table before an INSERT or UPDATE statement is issued. This may be accomplished by issuing a SELECT statement, (sometimes referred to as a “Singleton SELECT”) to see if a target record already exists. If a record is returned in response to execution of this statement, an UPDATE may be performed. Otherwise, an INSERT operation is executed. Other similar types of database operations may be used to determine the state of the database prior to issuing an INSERT or UPDATE statement.
Several limitations exist with the above-described mechanism. First, it requires the execution of several statements. An initial statement must be issued to obtain the database state. Next, decisional logic is required to test that state. Thereafter, logic is required to handle each of the two possible outcomes of the test operation, with one outcome involving issuance of the INSERT statement, and the other resulting in execution of the UPDATE statement. This logic sequence, which may be required in a large number of places throughout the software, adds complexity to the system, and may be difficult and time-consuming to test.
Another limitation associated with the foregoing approach is that it requires the database to be accessed twice. An initial database access is necessary to obtain the state of the database, and a subsequent access is needed to execute the UPDATE or INSERT statement. This adversely impacts system performance.
Still another problem involves the fact that there is no guarantee that the decisional logic will result in issuance of the correct statement. This is because a lock is not maintained on the database between the time the database state is obtained and the time the UPDATE or INSERT statement is issued. It is therefore possible for a different transaction (that is, a different, unrelated request issued to the database) to change the database state by executing an insert, update, or delete statement in association with a record of interest. This may result in the issuance of the wrong statement and the subsequent receipt of an error or some other condition returned by the DBMS that requires the calling program to handle via the execution of additional code. For example, if an attempt is made to update a record that does not exist, a “NO DATA” exception may be returned by DBMS to the calling program, indicating that the record was not located.
The problems discussed above are exacerbated by the fact that a class of application programs exists that does not recognize the distinction between an INSERT and UPDATE statement. This application class attempts to modify a database to a new state that includes a record containing specified data. These types of operations, which are performed without regard to whether that record previously existed within the database, frequently result in the receipt of errors. These errors are then handled by the error handling mechanisms of the DBMS. The invocation of the error handling procedures slows system throughput.
Another type of functionality that manifests problems similar to those discussed above involves MERGE-type operations. Whereas the INSERT and UPDATE statements are used to insert or update a single row of a table, respectively, a MERGE-type operation uses decisional logic to determine whether to merge a row from a first table into the row of a second table. An example of this type of operation is the MERGE statement supported by databases commercially available from the Oracle Corporation. As is the case with the INSERT and UPDATE statements, a user employing a MERGE statement is required to provide code for both a test condition, and code for each of the two alternative paths that may be taken following the test condition. As described above, this increases the complexity of the application, and makes testing more time-consuming and difficult.
The above-described problems have not been addressed because of the difficulty associated with providing a generic approach that is able to change a database from any one of multiple possible states to a second known state. This can best be understood by re-considering the distinctions between the INSERT and UPDATE statements. When an INSERT statement is executed, data is provided for each of the columns of the database, since each column will be “empty” at the time of record creation. In contrast, when an UPDATE statement is executed, it is generally desirable to provide data for only the selected columns that are to be modified, leaving the remaining columns unchanged. Without prior knowledge as to the database state, it cannot be determined which of these approaches should be used, and no generic solution has been devised that can equally apply to either type of situation.
Yet another related problem associated with transforming a database from a first unknown state to a second known state involves the prior art locking mechanisms that are associated with data retrieval in a DBMS. Generally, when a data retrieval operation such as a SELECT is initiated to read a record, a READ lock is activated on that record. This READ lock allows the transaction to read from, but not update, the record. After the SELECT is completed, that READ lock is deactivated.
As noted above, a SELECT is often performed to attempt to read the contents of a record. Thereafter, some decisional logic is executed to determine whether to issue an UPDATE or an INSERT statement for the record. Before the UPDATE or INSERT statement can actually be executed, however, a WRITE lock must be requested and obtained for the target record. This will allow the transaction that obtained the lock to both read from, and write to, that record. However, it is possible that between the time the SELECT was issued and the WRITE lock is requested, some other transaction was able to acquire a WRITE lock on the same target record, preventing the original transaction from continuing execution. When execution does resume on that transaction, the SELECT operation must be re-executed, since the contents (and even the very existence) of the record can no longer be assumed. This is a time-consuming process.
What is needed, therefore, is an improved system and method for addressing the foregoing problems associated with transitioning a database from a first unknown state to a second state.