Decreasing disk costs make it possible to take frequent snapshots of past storage system states and retain them on-line for a long duration. A new generation of snapshot-based applications that use the past to infer about the current state and to predict the future is rushing to the market. Snapshot systems are attracting the attention of storage systems practitioners and researchers alike, and snapshots are becoming a “must have” for a modern storage system. Existing snapshot approaches, however, offer no satisfactory solution to long-lived snapshots. Yet, long-lived snapshots are important because, if the past is any predictor of the future, a longer-time prediction needs a longer-lived past.
Existing access techniques to versioned past data in databases and file systems rely on a “no-overwrite” update approach. In this approach, the past state remains in-place and the new state is copied, so the mappings for the past state take over the mappings of the current state all at once, rather than gradually. For example, consider a storage system which has only five pages P1, P2, P3, P4, and P5. The database maintains the current state of P1-P5. If after a snapshot is declared page P3 is modified to page P3′, page P3 is left in place and page P3′ is copied to a later point in the database. Thus, the database may contain pages P1, P2, P3, P4, P5, and P3′. Although the past state is maintained “as is”, the current state becomes increasingly fragmented as page changes occur.
Split snapshot systems are a recent approach that is promising because, unlike other approaches, it does not disrupt the current state storage system in either short or long run, and because it allows garbage collecting of selected unneeded snapshots at no-cost, which is a useful feature for long-lived snapshots. An unsolved problem has been how to maintain an efficient access method for long-lived split snapshots without imposing undesirable overhead on the current storage system.
The problem arises because, to avoid disrupting the current state, the split snapshot system separates the past state from the current state, storing them separately. The current state is maintained in the database. Once a snapshot is declared, it is necessary to ensure that the declared snapshot accurately reflects the current state at declaration. In a split snapshot system, when pages are modified after a snapshot declaration, the unmodified page is copied to a separate storage system prior to modification and then the page in the database is modified “in place”. For example, consider a storage system which has only five pages P1, P2, P3, P4, and P5. The database maintains the current state of P1-P5. If after a snapshot is declared page P3 is modified to page P3′, page P3 is first copied to the separate storage system. Then the database is updated to reflect this change and now contains pages P1, P2, P3′, P4, and P5.
This greatly simplifies and speeds up access to the current state since the current state is maintained “as is”. However, access to past states is complicated by the fact that a snapshot's pages may be located in both the database and the separate storage system due to the fact that pages are only copied to the separate storage when they are modified. In the example above, the page table for the declared snapshot would need to indicate that pages P1, P2, P4, and P5 are in the database (since they haven't yet changed since the snapshot's declaration) while page P3 is in the separate storage (since it has changed after the snapshot's declaration). Since the snapshot page table is as large as the database page table (since they contain the same number of pages), when snapshots are frequent, managing such mutable snapshot page tables can be costly.
A “page” is defined as a virtual data block. A “mapping” is a data structure that provides a linkage from the logical/virtual address space of a page to the physical address space of a storage medium such as a hard disk drive or the like. A “snapshot mapping” is a mapping between a snapshot page and a snapshot storage. A “database mapping” is a mapping between a database page and a database storage. A “page table” is a data structure which contains mappings. Snapshot Page Tables (SPT) is a type of page table that can be created at low-cost by first writing the mappings of the snapshot pages into a sequential log as snapshot pages are copied to the snapshot store. The snapshot page table may then be constructed by scanning the log to find all the mappings for the snapshot pages. This can be costly if some pages are modified infrequently, since the scan has to pass over many repeated mappings of the frequently modified pages before finding the infrequent mappings.
Skewed update workloads are common in databases and file systems. The application requesting to run on a snapshot has to wait for the construction of the snapshot page table to complete. It is important, therefore, to reduce the time of the construction scan. Although in-memory techniques exist for split snapshot system to accelerate the construction scan, this approach supports only short-lived snapshots. Thus, an access method is needed for split snapshot systems that also supports long-lived snapshots.