For many database systems, it is desirable to have one or more physical copies of an original database. The database copies can be used to deploy and test applications or features prior to using them in production.
The database copy, referred to herein as a “snapshot database,” is a point-in-time copy of a base database. The snapshot database mirrors data contained in the base database up until the time the point-in-time copy is created. Once the snapshot database is created, changes to the snapshot database and the base database are made independently. However, in order to ensure testing continues to be accurate and relevant, the snapshot database periodically needs to be re-synced with the base database in order to ensure the snapshot database contains up-to-date data. When re-syncing the snapshot database with the base database, changes made to the snapshot database are reverted, while changes made to the base database are copied to or reproduced in the snapshot database.
One method for re-syncing the base database with the snapshot database is to create a new point-in-time copy of the base database, and replace the old snapshot database. However, for large databases, the process of generating a new physical copy of a base database may be time-consuming. Additionally, if the previous copy of the snapshot database is not (or cannot be) deleted prior to creating the new copy, twice as much storage space is required in order to store both copies.
A second method is to compare each data block in the snapshot database with the base database. If the blocks are different, then the data in either the snapshot or the base database has changed. The data block is copied from the base database to the snapshot database. However, for large databases, this results in a large number of data block comparisons. Not only are the data comparisons computationally expensive for large amounts of data, it is inefficient to compare all the data blocks if only a small portion of each database has changed. Additionally, if multiple snapshot databases have to be re-synced, then data block comparisons are performed for each copy, which significantly increases the time and resources required.
A third method is to track all changes to each database. For example, the database system could maintain one or more change logs with timestamps corresponding to changes made to each database. However, tracking changes adds additional processing costs to writing data. Additionally, if the database system is a clustered database system, it is difficult to maintain consistent time stamps across nodes in the cluster.
Based on the foregoing, there is a need for a method to provide an efficient and high performance re-sync of a snapshot database with a base database.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.