A database management system (DBMS) manages large databases and provides an operational environment in which stored information can be retrieved or updated. In one type of DBMS, referred to as a relational database system, information is stored in tables, with each table having one or more columns and one or more rows. Each column in a table is referred to as an attribute of the table, and each row in a table is referred to as a record. One well-known example of a relational database management system is the DB2® database management system. (DB2® is a registered trademark of International Business Machines Corporation of Armonk, N.Y.).
When transactions are processed in a DB2 database system (i.e., when there is an insert, update or delete operation made to a record), a DB2 subsystem creates log records and writes them out to a DB2 Log. Once written, the log records can be used for various purposes, such as data recovery, migration, and reporting. In general, the log records contain control information (i.e., Log Sequence Number, transaction identifier connecting the log record to other log records for the same transaction, etc.) and different types of content that depend on what is being logged. One of the most common types of content is an update for which the log record contains data from the first changed byte to the last changed byte or the end of the row.
An example of a software product that can use DB2 log records for various purposes is Log Master™ by BMC. (LOG MASTER is a trademark of BMC Software, Inc. of Houston, Tex.) Log Master is a software tool that can perform transaction log analysis for DB2 and may be used to back-out problem transactions using SQL statements generated from the information in the log records as well as being used to migrate data or report on the same log records. For example, Log Master can analyze the DB2 Log and locate transactions for the purpose of auditing, migration or data recovery. For data recovery, Log Master can generate SQL statements to correct the problem by performing an UNDO or a REDO action, Log Master may also be used to migrate data from a production database to other databases used for backup, historical information, decision support or warehousing.
FIG. 1A shows one process 10 in which Log Master may be used in conjunction with a DB2 Log according to the prior art. In the process 10, Log Master post-processes the DB2 Log 12, which has already been written by techniques known in the art. Log Master adds context to log records, provides control information and complete log images, and allows for decompression of data, among other capabilities beyond those available from the DB2 Log 12. In processing the DB2 Log 12, log readers 14 read the files of the DB2 Log 12 directly and buffer those log records of interest in buffers 16 for log assemblers 18. Generally, there is a log reader and assembler for each member of a data sharing group. A log factory 20 controls the merging of records from the log assemblers 18 and invokes log handlers 22 that are configured to handle the various log records. A logical log builder 24 is then called to produce logical log records that are fed into a base log record output pipe 26, which is also referred to as a Logical Log data pipe. Output processes 30 handle and sort the log records from the Logical Log Data pipe 26 to produce various outputs 40 for the user.
FIG. 1B shows the output processes 30 in more detail. Various pipe handlers 32 handle and sort the logical log records from the Logical Log Data pipe 26 depending on the desired output. For example, Logical Log Data pipe handlers 32 can handle logs for SQL generation, logs generated from DDL operations, logs for load data generation and creation of load control files, logs for reporting, and logs for recovery analysis. The pipe handlers 32 instantiate sorted pipes to feed logical log records to sort routines 34 and 36 so that the logical log records may be processed in correct order for various output handlers 38. The first sort routines 34 are E15 sorts that are written to operate concurrently with one another. The second sort routines 36 are E35 sorts that process their input data serially. After the sorting routines 34 and 36, various sorted output handlers 38 produce outputs 40, which can include (but not limited to), SQL file and template outputs, catalog record details, load data and control outputs, report outputs, recovery analysis outputs, and logical log data outputs.
Like most modern database management systems, the DB2 database system is able to store and manage large data using Large Objects (LOBs). There are three general types of LOBs: Binary Large Object (BLOB) for binary data; Character Large Object (CLOB) for single byte character data; and Double Byte Character Large Object (DBCLOB) for double byte character data. Some examples of LOBs include images of checks, bonds, claim forms, and the like. LOBs can also be XML or HTML documents, audio or video files, etc. Some LOBs may have very structured data, while others may have unstructured or simple data. The primary use of LOBs is to store column data that is larger than the limit for a base table.
FIG. 1C shows how the DB2 database system 100 manages LOBs according to the prior art. LOB data is stored differently than conventional data in the DB2 database system 100 because LOB data is large (up to 1-byte less than 2-GB) and is computationally expensive to manage and manipulate. To help manage the LOBs, the actual data for the LOBs is stored separately in LOB tablespaces 150 and 152, each having auxiliary tables 151 and 153 respectively. The base table 120 for the LOB data merely contains information identifying how to index into and access the LOB in the corresponding auxiliary table 151, 153. For example, the base table 120 defines one or more LOB columns 122 that only store information about the LOB data. The auxiliary tables 151, 153 store data for the LOB columns 122 outside the base table 120 in which they are defined.
There are a number of constraints on the way LOB information is arranged in the base tables 120. Each type of LOB (i.e., BLOB, CLOB and DBCLOB) that is identified in the base table 120 requires a separate LOB column 122. Regardless of how many there are, each LOB column 122 contains a two-byte indicator field 124 and a two-byte version field 126. The indicator field 124 indicates whether there is LOB data or not. If this field 124 is NULL or has zero length, there is no data stored in the auxiliary table 153 for the LOB column 122. The version field 126 identifies a version of the current LOB data for a given row in the base table 120.
In addition to the LOB columns 122, the base table 120 also contains one ROWID column 128 that stores usually system-generated values to identify and index the actual LOB data for a given row in the base table 120. During operation, an application program 140 can use a LOB identifier 130 to access the actual LOB data stored in the auxiliary tables 151, 153 in the LOB tablespaces 150, 152.
There are a number of constraints on the way LOB data can be stored in LOB tablespaces 150, 152 and auxiliary tables 151, 153. For example, each LOB tablespace 150 can only contain information for one LOB column 122 of a corresponding base table 120, and the LOB tablespace 150 cannot be partitioned. If the base table 120 is partitioned, then one LOB tablespace 150, 152 and auxiliary table 151, 153 must be created for each partition of the base table 120 and for each LOB column 122 of the partitioned base table 120. Moreover, each auxiliary table 151, 153 can have only one index (not shown).
As with logging of other data, changes to LOB columns 122 in the base table 120 can be logged by the DB2 database system 100. Using standard techniques discussed previously, the DB2 Log 12 (see FIG. 1A) can store log records of column inserts, updates, and deletes in the DB2 Log files. However, logging LOB data can require large amounts of storage due to the size of the LOBs and can present a number of complex issues. For example, logging must be disabled if the LOB column 122 is larger than 1 GB. Furthermore, BMC's transaction log analysis tool Log Master, does not fully support LOB data types. If logging is enabled on LOB tablespaces, only the REDO (new) value of actual data is logged, and any before images of updates and any data being deleted is not logged for LOBs. Therefore, UNDO processing (resetting to the previous value) cannot be done, but REDO or MIGRATION of these transactions is possible. Thus, Log Master cannot generate SQL undo operations if an SQL change (update or delete) references column data for LOB data type.
Accordingly, there is a need in the art for an efficient means of managing logged data for LOBs in a relational database system, such as the DB2 database system.