1. Field of the Invention
This invention relates to structured query language (SQL) enhancements, and more particularly relates to efficiently supporting generic SQL data manipulation statements.
2. Description of the Related Art
Updating and inserting data records into a database is a common data operation. Often, a user will update many data records with a common set of data fields, or columns, but not every field will be updated on every data record, and not every field will have information on a data insertion. For example, a database operator may make an annual update an employee records database. During the annual update, the operator may update 95% of the employee years-of-service values, 90% of the employee salary values, 50% of the employee benefit selections, 20% of the employee job titles, 10% of the employee work addresses, and 3% of the employee name fields. Additionally, the operator may insert new employee records into the database, and the new employees may not have information available for all fields at the time of the annual update. Most database applications utilize the SQL standard as the high level language for controlling database manipulations. For the annual update in the example, the majority of data records have a few changes, but the fields in which the changes occur for each data record vary widely.
Currently available SQL implementations do not efficiently support a generic SQL statement for a circumstance like the example where the field to be updated varies from record to record. A generic SQL statement is a statement referencing all of the data columns that may experience a change when the SQL statement is executed, and/or a statement referencing all of the data columns that may be involved in a selection criteria (i.e. a predicate) for selecting records to be manipulated. However, a given data record update or insertion may actually have information available for only some of the data columns—for example only a change to the years-of-service and salary values, for all of the data columns, or even for none of the data columns. Currently available SQL implementations do not efficiently support running an update operation to a data record, where no data is supplied for one or more columns of the update. Likewise, currently available SQL implementations do not efficiently support selecting data records based upon criteria, where no data is supplied for one or more columns of the selection criteria.
There are currently two approaches for handling generic SQL statements in situations like the example presented above. In a first approach, a client application captures the generic SQL statement, and provides a previous value for any data that is not available. This approach works, but imposes significant overhead on the system. For example, if the employee name is not provided in the example annual update, the client application has to retrieve the employee name from the database and send the employee name back to the database when the SQL statement is executed. The database management system has to check if the current user is authorized to write to the employee name field, run any referential integrity checks and/or triggered actions based on a write action to the employee name field, and then write the value of the employee name back over the previous value. This method imposes a significant network and processor overhead.
In the second approach, the client application creates a specific SQL statement that writes to only the data columns where information is available for a particular data record. In the example, there are six data columns, and potentially sixty-four (i.e. 26) different possibilities of specific SQL statements depending upon how many different combinations of data columns will be written to for the data records. Therefore, the system must have a large memory area available for caching execution plans for the SQL statements, or the system must not cache execution plans and just create a new execution plan for each data record. Whether the system utilizes a large memory cache or creates a new execution plan for each data record, the second approach imposes a significant burden on the system, and causes degraded performance of the database operations.
Even where the burdens of supporting generic SQL statements for the example are overcome, an additional complication is imposed by selection predicates with generic SQL statements where a data record on the client application does not provide some of the selection data, and where absence of that data indicates that its related predicate is to be ignored. In the example, an SQL statement may select only employee records where the years of service are greater than a given value (e.g. in an SQL WHERE clause), but a given data record on the client application may not provide a years of service value, which in this example indicates that all employee records are to be selected. There are no methods in the current technology to manage selection predicates with a generic SQL statement where some of the data records on a client application do not supply values to use in the predicate.