1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to automatic pruning for log-based replication of tables within a relational database management system.
2. Description of Related Art
Database replication refers to the process of duplicating the data contained in source database tables and placing it in another set of corresponding target database tables, either completely or partially, as well as either locally or remotely, and synchronously or asynchronously. Often, when the data in the original source tables is updated, it is generally more efficient to propagate the updates to the target tables, rather than duplicating all the data contained in the source tables.
In general, relational databases store updates in a journal (hereinafter referred to as a log) for recovery reasons, in addition to writing the updates permanently to the tables stored on disks. Log-based replication refers to the asynchronous process of reading the updates to the tables from the log and propagating the updates to the target tables.
The database log requires significant disk space since it grows as updates are made to the database tables. Therefore, the old data in the log needs to be deleted (hereafter referred to as pruned), if it is no longer needed for database recovery operations. On the other hand, log-based replication requires that the log be available to capture the updates to the database tables. As a result, the log-based replication and the pruning of the log have to be coordinated to prevent missing any updates in the target tables. It is non-trivial to determine what the optimal point is to satisfy both the log-based replication and the pruning of the log.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for automatic pruning of a log used in log-based replication of database tables within a relational database management system. An optimal point at which to prune the log is periodically determined, such that the optimal point provides a minimum amount of storage space for the log and yet ensures that all updates to the database table can be properly replicated from the log. The log is then automatically pruned of selected records prior to the optimal point.