This present invention relates to databases and, more specifically, to methods and apparatus for reducing costs associated with manipulating data.
In a client-server architecture, the client (application program) relies on a server to perform some operations. In the context of database servers, the application program passes data to the server, the server stores the data, and the application program manipulates and retrieves the data by sending commands to the server.
Application programs typically interact with database servers using a query language such as the Structure Query Language (SQL). SQL is a powerful non-procedural language which serves as the industry standard for relational database management systems. Data Manipulation Language (DML) statements are SQL statements that retrieve or manipulate data in tables.
Frequently, database applications must manipulate large sets of records. For example, an application program APP1 may insert an array of data xe2x80x9cemployee_statsxe2x80x9d to a database server, where employee_stats consists of employee information for each of one thousand employees. Statements used to manipulate a series of records are referred to herein as array DML statements. The operations performed by database servers in response to array DML statements are referred to as array DML operations.
Various approaches may be used to pass data from an application program to a server. In one approach, APP1 passes employee_stats in the form of literal data (a xe2x80x9cliteralxe2x80x9d). A literal is a value written exactly as it is meant to be interpreted. A literal can be a number, a character, or a string. For example, in the expression,
Dept_Number=5
xe2x80x9cDept_Numberxe2x80x9d is a variable and xe2x80x9c5xe2x80x9d is a literal.
An application is able to pass literal data to the server only if the literal data is built into the code of the application program. Building literal data into an application program is known as hard coding. The drawback to hard coding is that the code of the application program needs to be changed and re-compiled each time there is a change to the literal data.
In another approach to providing data to a server, an application program passes data to a server using variables. For example, assume that APP1 uses an array xe2x80x9cINSERTxe2x80x9d statement to insert employee ID numbers into an array xe2x80x9cA( )xe2x80x9d. In database systems that support the Structured Query Language (SQL), such a statement may take the following form:
INSERT INTO A (e_id)
In this statement, xe2x80x9ce_idxe2x80x9d is a variable of data type number. In our example, xe2x80x9ce_idxe2x80x9d is a unique value used in identifying each employee. The array A( ) has a size of 1000, indicating that there are 1000 employee ID numbers. In response to this statement, the database server performs an INSERT operation 1000 times (one INSERT per row of the array). Each of the 1000 operations is referred to herein as a xe2x80x9cper-itemxe2x80x9d DML operation, because it only operates on a single item. Thus, a per-item DML operation means changing a single row of a given array.
Typically, a database server must perform a series of per-item DML statements in response to a single array DML statement. In one approach to handling array DML statements, the database server parses the INSERT statement before each per-item INSERT operation is executed in order to insert a row in array, A(1000). The database server parses the array DML statement to create an actual executable statement to perform the desired per-item operation on the data.
For example, the process of parsing involves passing the SQL statement, INSERT INTO A (e_id), from the user process to the database server (one round trip). During the parsing process, the database server, among other things, verifies that the SQL statement is syntactically and semantically valid, determines an execution plan for the statement, and allocates memory from a private region in memory (xe2x80x9cprivate SQL areaxe2x80x9d) to the verified SQL statement. Thus, parsing and execution is an expensive process because a roundtrip to the server is made each time a data manipulation language statement is to be parsed and executed and thus, the example above would result in 1000 roundtrips to the database server. However, the database may also allocate memory from a shared region in memory (xe2x80x9cshared SQL areaxe2x80x9d) to the verified SQL statement. The xe2x80x9cshared SQL areaxe2x80x9d is an area in memory used for processing subsequent occurrences of the same SQL statement. The database server parses a SQL statement only if a shared SQL area for an identical SQL statement does not exist. If a shared SQL area for an identical SQL statement exists, there is no need to completely parse the identical SQL statement again.
However, even in the case when the data manipulation language statement is parsed once but executed multiple times (i.e., only one roundtrip to the server is needed), there is another drawback when a failure occurs during the execution of an array DML statement. When a failure occurs, the performance of the array data manipulation operation is automatically aborted, even though the operation is incomplete. A new DML statement has to be re-issued in order to proceed with the operation. For example, assume that a failure occurs when performing the INSERT operation for the nth row in A(1000). An example of a failure is when the attempt to insert the nth row involves inserting a key value into a column that is subject to a uniqueness constraint, and the key value to be inserted matches a key value that already exists in the constrained column. Such an insertion attempt results in a xe2x80x9cUNIQUE KEY CONSTRAINT VIOLATIONxe2x80x9d. Assume that xe2x80x9cemployee IDxe2x80x9d is a unique value assigned to each employee and is stored in a unique key column of employee table. Thus, an attempt to insert a duplicate value in A(1000) would result in a Unique Key Constraint violation. In order to continue inserting values in the rest of the rows of A(1000), the INSERT statement must be re-issued, re-parsed and executed by the database server, resulting in at least an additional roundtrip to the database server. Thus, each failure that occurs during insertion of a row into A(1000) results in an additional roundtrip to the database server.
Based on the foregoing, there is a need for a method or mechanism for reducing the costs associated with manipulating data.
Techniques are provided for reducing the costs associated with manipulating data. According to one aspect of the invention, the techniques involve binding data by reference, performing data manipulation operation on a data item, and if an error occurs, then storing the error information and continuing the process by performing the operation on the next data item. In one feature, the error information is used to create corrective data for the data manipulation operation.