The present invention relates generally to information processing environments and, more particularly, to the process of logging transactions which are posted in a data processing system, such as a Database Management System (DBMS).
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as "records" having "fields" of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level.
DBMS systems have long since moved from a centralized mainframe environment to a de-centralized or distributed environment. Today, one generally finds database systems implemented as one or more PC "client" systems, for instance, connected via a network to one or more server-based database systems (SQL database server). Commercial examples of these "client/server" systems include Powersoft.TM. clients connected to one or more Sybase SQL Server.TM. database servers. Both Powersoft.TM. and Sybase SQL Server.TM. are available from Sybase, Inc. of Emeryville, Calif. The general construction and operation of a database management system, including "client/server" relational database systems, is known in the art. See e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
Traditionally, database management systems (e.g., the above-described client/server database systems) have been employed for on-line transaction processing (OLTP)--posting data from "transactions" to a database table. As part of this process, OLTP systems typically employ a logging system to log changes which occur to the system. In a commercial embodiment such as Sybase SQL Server System 11, this is done by posting log records to a transaction log. Every transactional operation, including inserts, updates, and deletes, causes a log record to be written to the transaction log or simply "log." In System 11, the transaction log is referred to as "syslogs." Each particular log record characterizes the change which has occurred to the database during processing of a transaction. This information can be used, for instance, in error recovery, to restore the database to a pre-existing, consistent state.
Consider a scenario where a transaction performs updates to a table but then the transaction "rolls back"--that is, aborts. In such a case, the system will undo the updates by reading backwards from the log and reversing the changes which were made (as a result of the updates). The recovery system of databases, therefore, employs the logging system and log records when performing the work of rolling back a transaction. In a similar fashion, the log can be used in the face of a failure, such as when a machine "crashes." As the log is read during recovery, some transactions are re-done on the one hand, while incomplete transactions are undone on the other. In addition to rolling back transactions and supporting error recovery, the log also provides an archive for the database, which documents the specific actions which have led to the current state of the database. All told, the log plays a critical part in the design and implementation of present-day relational database systems.
The logging system itself permits reading from and writing to the log. Write access is typically performed by "access methods" within a relational database system (i.e., a database system which presents data as tables or "relations"). In particular, these methods generate log records which describe actions occurring which affect the database. Read access, on the other hand, is generally provided by a recovery system within the database. In general, therefore, a database system includes subsystems for writing log records into the log and, if needed, reading back those records.
A general description of the design and implementation of a logging system in a relational database is provided by Gray, J. and Reuter, A., Transaction Processing: Concepts and Techniques, Morgan Kaufmann Publishers, 1993, the disclosure of which is hereby incorporated by reference. For an overview of relational database systems, see the abovementioned An Introduction to Database Systems, the disclosure of which has been previously incorporated by reference.
A dichotomy exists between data structures in memory and data structures on customers' disks. Designers will often come up with very elegant designs that fall short because such designs run into conflicts with data structures already on existing customers' disks. Such conflicts include, for instance, changes in data page.
Of particular interest herein are changes to log record formats. As described above, the "log" is a special collection of data or files that is used to restore consistency to a database after an unanticipated failure. This is based on the concept of a transaction which creates bounds, a began and an end, for a set of work. The concept of a transaction is fundamental to a DBMS, presenting an all or nothing scenario. Support for this concept is provided by a "write ahead log." Before a change is made to data that can reach disk, the system logs the before and after version of the data and makes sure that that reaches disk first, that is, before the data change itself can reach disk. This concept is generally referred to as "write ahead logging." In this manner, if there is a system failure, upon system recovery the system can detect an incomplete transaction and reverse the changes in that incomplete transaction.
In a DBMS, there are many operations which occur to transform data. In Sybase SQL Server, for instance, each is represented by a log operation code or "logop." Each logop implies its own log record. Here, there is not a single log format, but instead a variety of log record formats. One log record might have several logop formats apply to it. One logop can only have to one of log record apply to it.
Many times, designers have often wanted to add new types of information to log records. As an example, consider the design of Sybase System 11. Here, designers added new fields to certain log records. The fields could not be added at the end because those log records had variable length fields which, by design, are placed at the end of a log record (and there can only be one of them per record). As a result, adding a new field necessarily displaces the pre-existing fields, since the new field must at least go in front of the variable length field. The problem which arises, therefore, is how to maintain compatibility between different versions of a database system.
One approach to the problem is to simply require that log structures not be mixed between new and old versions. In other words, the customer or user is required to have a clean upgrade, so that the user would not employ older log records (e.g., from Sybase System 10 in Sybase System 11). Such an approach is problematic, however. To export information, the prior system would dump database information using a "dump database" command. "Dump" and "load" are the means by which databases are archived against possible media failure. In such prior versions, however, the system required that a recovery phase or "Recovery" be run against the database as part of its operation; this required use of log records.
In addition to dumps for protecting against media failure, database dumps are often used as archives or retiring data. For instance, the user might want to dump data records for a particular year but then, from time to time, load the data for performing historical research. Just as after a system crash, log recovery has to be applied by a load database command. Thus, if System 10 dumps were to be used in System 11, log records would have to be read. Hence, the need for log compatibility.
What is needed are system and methods which preserve log compatibility even when such a system is updated or "upgraded" with a newer database engine, including one which changes the formats of log records. The present invention fulfills this and other needs.