1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to a method of providing traditional update semantics when updates change the location of records in a database.
2. Description of Related Art
Computer systems incorporating Relational DataBase Management System (RDBM) software using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (SO).
In RDBMS software, all data is externally structured into tables comprised of rows (records) and columns (fields). SQL operations upon the data are generally row-oriented and include insert, update and delete operations, as well as select (retrieve) operations, join operations to concatenate data retrieved from multiple tables, etc.
When tables are large, containing many rows of data, it is preferable to divide the table into partitions, where each partition corresponds to a separate unit of storage and contains a portion of the rows in the table. By partitioning tables, the speed and efficiency of data access can be improved.
For example, partitions containing more frequently accessed data can be placed on faster devices and parallel processing of data can be improved by spreading partitions over different DASD (direct access storage device) volumes, with each I/O (input/output) stream on a separate channel path. Partitioning also promotes high data availability, enabling application and utility activities to progress in parallel on different partitions of data. This is referred to as “partition independence”.
Data may be divided among the partitions of a table by a variety of schemes. For example, one scheme is to partition data based upon a range of values for a designated set of columns, herein called the partitioning key columns. When a row is inserted into a table so partitioned, the values in the partitioning key column(s) of the row dictate the partition of the table into which the row is inserted. Similarly, an update that changes the value in a partitioning key column of a row could change the partition to which the row belongs, thereby requiring a change in the location of the data row.
The prior art teaches that when an updated row does not fit in the space occupied by the original row, the updated row should be placed elsewhere and the original row changed into a pointer record that locates the updated row. By providing this level of indirection, the updated row has not conceptually moved. Indexes continue to reference the original row's location and need not be updated to locate the updated row directly. By allowing access to the updated row only via its pointer record, other applications that are retrieving rows from the table by scanning the table in row sequence will not “see” a given row multiple times or not at all because of row movement.
It is not attractive to adopt this technique for dealing with an updated row that no longer belongs to its current partition because the pointer record would point from one partition into another. This is not attractive because it erodes partition independence, hampers parallel processing techniques, and hinders data access techniques that rely on identifying the partition to which a data row belongs based on the row location referenced in the rows index entries.
Moving a row from its original to its new partition does not have the above problems. However, row movement can cause access paths that retrieve data in row sequence (or in row sequence within key value) to see a data row multiple times or not at all. For example, the row could be retrieved from it original position, then again from its new position. These are not the traditional semantics of the effect of update operations on concurrent readers.
Row movement also makes the notion of cursor position less straightforward. For purposes of “operate on the current row again” operations, the position after an update that moves the row should be on the moved row. For purposes of “operate on the next row” operations, the position should be at the original row location.
Due to these or other difficulties, products on the market today either disallow the update of values in partitioning key columns, or they allow the update provided the update does not alter the partition to which the row belongs. Thus, there is a need in the art for improved techniques for updating values in partitioning key columns.