1. Field of the Invention
This invention is related generally to the restructuring of a database and more specifically to changing the storage format of a database table while maintaining database operability to service transaction requests.
2. Description of the Related Art
FIG. 1 is a diagram of a computer system 10 that includes a database management system (DBMS). The system is a distributed computer system having multiple computers 12, 14, 16 interconnected by local area and wide area network communication media 18. The system 10 generally includes at least one database server 12 and many user workstation computers 14, 16.
The database server 12 includes a central processing unit 20, a primary memory 22, a communications interface 24 for communicating with user workstations 14, 16 as well as other system resources (not shown). The secondary memory 26-1, 26-2, typically a magnetic disk storage system, in the database server 12 contains the tables 30, 32 and other support information managed by the database management system 34. The secondary memory 26-1, 26-2 of the database server 12 also includes programs that manage the database tables and applications programs that users invoke to perform transactions on the database tables. A transaction manager 44 is typically included in the programs that manage the database tables. Not only does the transaction manager 44 coordinate the execution of transactions, it also creates audit entries for each transaction, which are durably stored in an audit trail file 48 in the secondary memory.
The end user workstations 14, 16 typically include as central processing unit (CPU) 50, primary memory 52, a communications interface 54 for communicating with the database server 12, and other system resources, a secondary memory 56, and a user interface 58. The user interface 58 typically includes a keyboard and display device, and may include additional resources such as a pointing device and printer. Secondary storage 56 is used for storing computer programs, such as communications software used to access the database server 12. Some end user workstations 16 may not include secondary memory 56, relying on software that is downloaded to their primary memory 52 from a server computer such as the database server 12.
Database management systems (DBMS) 34 typically manage the storage of database information independently from any co-resident operating system 70, in FIG. 2, in the computer system or systems 68 on which the DBMS 34 is installed. The DBMS 34 conventionally has several layers of software. FIG. 2 shows the layers of the DBMS 34 and their relationship to the general computing system. The DBMS includes a User Interface 72, an Application Programming Interface (API) 74, a Catalog Manager 76, a Relational Engine 78, and a Data Access Manager 80, which has access to a disk volume 82.
Requests for DBMS functions originate from a user via the User Interface 72 or from a program via the API 74. The request usually takes the form of a command expressed in the SQL language.
The Catalog Manager 76 accepts requests for the creation, deletion or modification to the definition of database objects such as tables. The metadata for these database objects is under the control of the Catalog Manager 76 and includes a catalog 92 (in FIG. 3A) and one or more file labels 96a-c (in FIG. 3A); the catalog contains a set of tables 98a-c (in FIG. 3A) describing the various data tables, indexes and views of the database; and the file labels 96a-c are encoded forms of file and catalog information.
The Relational Engine 78 accepts requests from the API 74 for manipulation of the contents of the database objects and supports the concept of a logical table of database records.
The Data Access Manager 80 accepts requests from the Relational Engine 78, and the Catalog Manager 76. These requests cause the contents of portions of the tables to be inserted, deleted, or modified. In addition, requests from the Catalog Manager 76 allow the metadata contained in the file labels 96a-c to be modified.
FIG. 3A is a diagram of the pertinent logical structures in the DBMS 34. The DBMS contains a data dictionary 90 that includes a catalog 92 and a directory 94. The data in the DBMS includes base tables 100, 102, 104 for storing user data, and index tables 106, 108, 110 which facilitate the access to the base tables 100, 102, 104. Base tables 100, 102, 104 and index tables 106, 108, 110 are divided into partitions, p1, p2 p3 as shown, the division being based on the records (rows) that satisfy a certain constraint. In some cases, a table has only one partition, which means that the partition includes all of the records of the table. The catalog 92 includes one or more tables 98a-c for describing the data tables and index tables in the data portion 112 of the DBMS.
One of these tables is the PARTNS table 98b and another is the FILES table 98a. Referring to FIG. 3B, the PARTNS table 98b includes the following fields, FILENAME, PRIMARYPARTITION, PARTITIONNAME, CATALOGNAME, and FIRSTKEY. The FILENAME field contains the name of the file in which the partition is stored, the PRIMARY field contains a flag indicating whether the named partition is a single partition or the first in a set of partitions, the field PARTITIONNAME contains the name of the partition, the CATALOGNAME contains the name of the catalog in which the PARTNS catalog table is found, and the FIRSTKEY field defines the group of records of the table that are in the partition.
In the example shown in FIG. 3B, there are two partitions of the table, partition A and partition B. These partitions are each stored in a file having the same filename as the partition, file A for partition A and file B for partition B. The first and second records of the table pertain to partition A and the third and fourth records of the table pertain to partition B.
The FILES table 98a includes the following fields, among others that are not pertinent to the present invention, FILENAME, PRIMARYEXT, PRIMARYEXT2, SECONDARYEXT, SECONDARYEXT2, PARTITIONARRAY, and FILEFORMAT. The FILENAME contains the name of the file that holds a partition, the PRIMARYEXT contains a number telling the Data Access Manager 80 how to manage the initial allocation of physical space on the disk volume for a Format 1 file, the PRIMARYEXT2 is similar to the previous field except allocation is for a Format 2 file, the SECONDARYEXT field contains a number telling the Data Access Manager how to manage subsequent allocations of physical space on the disk volume for a Format 1 file, SECONDARYEXT2 is the same as the previous field except allocation is for a Format 2 file, PARTITIONARRAY indicates whether the partition array is Format1Enabled or Format2Enabled, and FILEFORMAT contains the format of the file named in the record. In the example shown, files A and B are both Format2Enabled but file A is stored in Format 1, whereas file B is stored in Format 2.
The directory part 94 of the data dictionary includes one or more file labels 96a-c which, as mentioned above, store encoded information about the tables, partitions thereof, and files that contain a table partition.
FIG. 4A is a diagram showing a partition array 120 within the catalog. The partition array 120 is an ordered list of partition information that describes the criterion by which the records of a table are grouped together to create a partition. The criterion is based upon the primary key of the base table, the primary key being one or more of the fields of a database table record whose values uniquely identify the row of a table and determine the order in the table in which the records are stored. All partitions of a database table have the same partition array encoding, independent of the fact that a particular partition of the partition array may be stored in Format 1 or Format 2. Within a Format1Enabled table, all partitions are stored as Format 1. Within a Format2Enabled table, individual partitions are stored as either Format 1 or Format 2.
Each partition array element includes fields from the PARTNS catalog table 98b and the FILES table 98a, In particular, an element of the partition array 120 includes the format of the partition array 122, the name of the partition of the table 124, the primary 126 and secondary 128 extent allocation numbers, the format of the partition 130, and the first key value 132 for the partition.
FIG. 4B is a diagram showing the partition array 134 within the file labels of a Format1Enabled table. The format of the partition array 136 itself is stored separate from the partition array elements 138a-c as it applies identically to each element of the array while each partition array element 138a-c includes the name of the partition of the table 140, the primary 142 and secondary 144 extent allocation numbers, the format of the partition 148, and the first key value for the partition 150.
FIG. 5 is a diagram showing index blocks and data blocks in a file 170, which is defined as a collection of fixed-sized blocks 172a-j. A file 170 is a unit of physical storage and collections of files are stored on disk volumes 82 (in FIG. 2). A file 170 is completely contained on a single disk volume and is therefore limited to the size of the disk volume. A partition, as defined above, is contained in a single file and a single file contains a partition, so that there is a 1:1 relationship between a file and a partition. Therefore, the size of a partition is limited by the size of a single disk volume. FIG. 5 shows the index blocks 172a-g arranged to form a tree, such as a B-tree, with blocks at the end nodes of the trees pointing to the data blocks 172h-j, though this organization of index blocks is not essential to the invention. Files 170 are tracked by the directory 94 (in FIG. 3A) which contains one file label for each file.
Currently, the size of the fields in the catalog, directory and the index and data blocks themselves, limit the size of a partition of an SQL table to about two gigabytes (2 GB). However, it is desirable to have partitions of tables greater than two gigabytes to create larger tables, which are limited by the size of the partitions multiplied by the number of partitions. Creating larger tables poses a significant problem, though, for existing, operating databases because the size of critical fields in the metadata must be increased without significantly interfering with the operation of the database. It also means that the software must be able to recognize and handle multiple control field formats, one for existing data and one for new data.
Conventionally, one way to address these problems has been to take the DBMS down while data is converted from the old format to the new format. Conversion is typically performed by a custom program that reads the old data format and writes the new data format. For large databases, the time to convert the data is sufficiently long that the amount of downtime is unacceptable. To minimize this impact, a second system is sometimes utilized. The computer hardware for the original DBMS system is replicated, a snapshot of the user data is taken and the data is converted to the new format on the replica system. Both the original DBMS system and the replica DBMS system are operated in parallel, with each system processing user transactions. After a period of time, a switch is made so that the replica system becomes the primary system in support of the DBMS.
The conventional approach to restructuring the database to change the size of the tables clearly has drawbacks. Either there is a long downtime during which users cannot use the DBMS application or there is the expense of replicating and operating a system in parallel to minimize the downtime. Furthermore, a custom program is required to restructure the tables.
There is, thus, a need for a method of restructuring a database that avoids the expense of replicating and operating a duplicate machine, and a custom conversion program, while guaranteeing a smooth transition to the new data format without significantly interfering with the users"" access to the DBMS.
The present invention is directed towards the above need. A method in accordance with the present invention is a method of restructuring the physical storage of a table in a database. The table has a plurality of records and one or more partitions into which records of the table are grouped. Each partition is stored in a file made up of a number of fixed-size blocks. The database further includes control information describing the table, associated files and partitions, and an audit trail describing updates, if any, to the records of the table. The steps of restructuring the physical storage of a table include altering the control information to indicate that each partition of the table has a first format in which each partition is currently stored in a first storage unit and updating the control information to indicate that each partition of the table is enabled to be stored in a second format. Next, a partition to be restructured is selected. The blocks of a first file, stored in a first location of the first storage unit and containing the selected partition of the table, are copied from the first storage location to a second file at a second storage location of the second storage unit, while allowing updates to the table, including updates to the selected partition in the first file. The blocks in the second file are altered to have the second format and any records in the selected partition in the second file are updated based on the audit trail, while allowing updates to the table including updates to the selected partition at the first storage location. After updating any records in the second file based on the audit trail, the table is locked to prevent updates to the table. The records in the selected partition at the second storage location are updated, as needed, based on the audit trail, while the table is locked and the table is then unlocked to allow updates to the table including updates to the selected partition at the second storage location.
Another method, in accordance with the present invention includes selecting a partition to be restructured, where the selected partition is stored in a second file with a second format, and copying the blocks of the second file, stored in a second location of a second storage unit, from the second storage location to a first file at a first storage location of a first storage unit, while allowing updates to the table, including updates to the selected partition in the second file. The blocks in the first file are altered to have a first format and any records in the selected partition in the first file are updated based on the audit trail, while allowing updates to the table including updates to the selected partition at the second storage location. Next, the table is locked to prevent updates to the table, after updating any records in the first file based on the audit trail and any records in the selected partition at the first storage location are updated, as needed, based on the audit trail, while the table is locked. The table is then unlocked to allow updates to the table including updates to the selected partition at the first storage location.
The above method includes changing the format of a partition from a second format back to a first format, assuming the partition has not increased beyond the size permitted by the first format. This is useful when the user needs to return to an earlier release of software that does not allow tables with partition in the second format.
One advantage of the present invention is that the database system remains operational during the restructuring of the physical storage of a partition of the database.
Another advantage is that the number of blocks in a file containing a partition and the size of each block are not fixed.
Yet another advantage is that a large table need not be restructured at one time or before the table is again usable. Restructuring of a large table having many partitions occurs one partition at a time, thereby minimizing the impact to the users of the large table.
Yet another advantage is that a large table can be created using larger partitions rather than more smaller partitions. Creating a large table with larger partitions rather than more partitions minimizes the complexity of managing the table and thus minimizes the impact to the users of the large table.
Yet another advantage of the present invention is that the table restructuring is not one way, thus making a return to earlier software releases possible. Return to the first format is possible as long as the partitions involved meet the first format size constraints.