A snapshot is a body of data constructed of data from a "master" table. The master table may be local or remote relative to the snapshot. The data contained within a snapshot is defined by a query that references the master table and optionally other tables, views or snapshots. A snapshot can be refreshed on a periodic basis to reflect the current state of its corresponding base tables. An "updateable snapshot" is a snapshot to which updates may be directly made. Such updates are propagated from the snapshot back to the master table.
One method of refreshing snapshots is to reissue the defining query for the snapshot and simply replace the previous snapshot data with the results of the reissued query. This method is referred to as a "complete refresh." Complete refreshes are particularly disadvantageous when the master and snapshot tables are located at different sites in a network, because data for the entire refreshed snapshot table must be sent through the network.
Alternatively, another method known as a "fast refresh" can be performed to expedite the refresh operation. During a fast refresh, only those changes to the master table which have been made since the last refresh of the snapshot are transferred to the snapshot. A log file (referred to as a "master log") can be employed to track and record the rows that have been updated in the master table. A master log can be a buffer in main memory or a file stored on a persistent memory, such as a hard disk. When a snapshot is refreshed, only the appropriate rows in the master log need to be applied to the snapshot table. In a networked environment, only those rows updated or inserted at the master site since the previous refresh are transferred across the network and updated or inserted into the snapshot. Rows deleted in the master table are also deleted in the snapshot. Fast refresh is typically faster, more efficient, and involves less network traffic than a complete refresh.
A prior implementation of snapshots employs rowids to drive fast refreshes. A rowid is an address that reflects the physical location in which a corresponding row in a database table is stored. In prior fast refresh mechanisms, modified rows in the master tables and the updateable snapshots are recorded using rowids. During a fast refresh, distributed queries based upon the recorded rowids in the master log and the snapshot itself are used to reconcile differences between the snapshot and the master table.
There are several disadvantages with fast refresh methods driven by rowids. One disadvantage is that rowid-driven fast refresh methods present a problem in heterogeneous environments, since not all master sites employ rowids that are useable to the snapshot site. For example, the snapshot site may be maintained by a database system supplied by one vendor while the master site is maintained by a database system supplied by a different vendor. Since rowids are implementation-dependent, it is likely that the rowids of the master site are incompatible with the rowids of the snapshot site. Thus, fast refresh mechanisms based on rowids may not be possible with such incompatible databases.
Another disadvantage with using rowids to drive a fast refresh is that rowids typically do not persist through a reorganization of master tables. A table may be reorganized, for example, if the table becomes excessively fragmented from having numerous insert, update, or delete operations performed on that table. When a table is reorganized to defragment the table, row data in one disk block may be moved to another disk block to occupy storage that was previously used by rows that have been deleted. Since rowids represent the physical addresses of rows, after such reorganization, the rowids previously recorded for the snapshot are invalid, thereby precluding a fast refresh driven by rowids. Consequently, reorganizing a master table forces all snapshots to perform a complete refresh after the table reorganization. Another disadvantage to a rowid fast refresh method is that rowids have to be stored as part of the snapshot, consuming space for administrative overhead.
The problems described above are a barrier to large-scale deployment of snapshots and limit the usefulness of snapshots in heterogeneous environments. Thus, it is highly desirable to have a method and system for implementing snapshots and fast refreshes which can overcome such problems.