1. Field of the Invention
The present invention relates to a method, system, and program for extracting data from one or more columns in a record having a first format and inserting the extract data into columns in a table in a second format using dynamic code generated from tables defining the first and second formats.
2. Description of the Related Art
Data records in a relational database management system (RDBMS) are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. One or more indexes may be associated with each table. An index is an ordered set of pointers to data records in the table based on the data in one or more columns of the table. In some cases, all the information needed by a query may be found in the index, making it unnecessary to search the actual table. An index is comprised of rows or index entries which include an index key and a pointer to a database record in the table having the key column values of the index entry key. An index key is comprised of key columns that provide an ordering to records in a table. The index key columns are comprised of the columns of the table, and may include any of the values that are possible for that particular column. Columns that are used frequently to access a table may be used as key columns.
A database management system (DBMS) includes control tables that define the format of tables and indexes within the database. A table definition table (TDT) includes information defining the format of the columns in a database table, such as the data type in each column (field) and the length of each column. An index definition table (IDT) defines the columns that comprise each index key, i.e., one or more columns from a table. One IDT may maintain information for multiple indexes on the same table.
In International Business Machines Corporation""s (IBM) DB2 product,** a load utility is used to add data from records to an index. To add index keys to an index, the prior art load utility would have to extract the data from the columns in the record that correspond to the index key and then add such data to the index columns. During this operation, the load utility looks-up the definition of the table in the TDT and the index in the IDT, and uses those definitions to guide the extraction of data from columns from the rows of the table to the columns in the index keys. In very large databases, the load utility may insert data from millions or billions of rows into index keys. For each load operation, the load utility would look up the table and index definitions to extract columnar data from a table record to add to an index key for each record inserted into the index.
**Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation; DB2, AIX, OS/390, and OS/2 are registered trademarks of IBM, MVS is a trademark of IBM; and Oracle8 is a trademark of Oracle Corporation. 
Below are the steps performed by prior art load utilities to extract data from records in a table to add as index keys to an index:
1. Look-up IDT for the index to which data will be added and find the name of the first column in the key.
2. Look-up the TDT to find the location of the table column corresponding to the index key column, and that column""s starting position and length in a record.
3. Using the column""s starting position, locate the column within the record from the table, and use the column length to copy all the data from the start to end positions in the column to the position of the index key to which the data should be copied. The first copy operation to the index would copy the data to the first position of the index, which is the starting position of the first key. The index definition table would be used to determine the location of further key columns in the index to receive the copied data.
4. Repeat steps 1 to 3 for each additional column in the index key.
5. Repeat steps 1 to 4 for each additional index that will be updated from the record. (A single record can be added to multiple indexes that have key columns corresponding to the same table).
With the above prior art technique for adding records to an index, the load utility must look-up information in the TDT and IDT for each record from the table from which data is inserted into the index. Further, the TDT and IDT tables are looked-up for each column in the index because data is extracted from the record and added to the index on a column-by-column basis. Thus, when adding a large amount of records to index keys, the processing time to look-up the table and index definition information from the TDT and IDT can consume a significant amount of processing cycles and time, thereby degrading the performance of load and insert operations.
Thus, there is a need in the art for an improved technique for adding records from a database table to index keys.
To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, and program for extracting data from a record in a first format and inserting data into a record in a second format during an insert operation. A determination is made of location information for data in at least one record in at least one column in the first format involved in the insert operation from which data will be extracted and of location information for data in at least one record in at least one column in the second format involved in the insert operation into which the data extracted from a corresponding column in the first format will be inserted. For each column from which data will be extracted, the determined location information is used to generate at least one instruction to move the data in each column in the record in the first format to each corresponding column of one record in the second format. The location information of the columns in records in the first and second formats involved in the data movement operations is encoded in the at least one instruction. The instruction may be repeated for multiple records in the first format from which data will be extracted into multiple records in the second format.
In further embodiments, each column from which data is extracted is from one record in a table including records in the first format and each column into which data is inserted comprises a new record in an index in the second format.
In yet further embodiments, a search is made of at least one table for records that match a search criteria Each record from which data is extracted comprises records located in the search that match the search criteria. In such case, the columns into which the extracted data is inserted comprises a search key. The search key is used to query records in another table.
Still further, the at least one instruction is generated in response to a call from an insertion process. In such case, the at least one instruction extracts data from at least one column of a record in the first format in memory and inserts the data into a corresponding key column in an index in the second format.
Preferred embodiments provide a technique for inserting data from a column in a record to a column in another record that avoids having to look-up information in definition tables that define the format of the records. Instead, preferred embodiments dynamically generate code that embeds the format of the records from which data is extracted and to which data is inserted into instructions that can be repeatedly executed. This allows the insertion operation to be performed for multiple records without having to return to the definition tables to determine the format of the records subject to the move operation.
Thus, the preferred embodiment technique generates instructions to carry out multiple extraction and insertion operations from the same source table to the same target table, such as an index, by using the table and column definition information encoded in the instructions without having to repeatedly look-up the information in definition tables. New dynamic code would be generated for each source and target table subject to the move operations to allow multiple operations to be performed with respect to the same source and target using the generated dynamic code.
Preferred embodiments provide a faster and more efficient method for extracting data from columns in numerous records from one table and inserting the data into columns in another table than current techniques that repeatedly refer to the definition tables for each individual data movement operation.