The disclosure generally relates to the field of data processing, and more particularly to database and file management or data structures.
An organization can specify a data management strategy in a policy(ies) that involves data recovery and/or data retention. For data recovery, an application or program creates a backup and restores the backup when needed. The Storage Networking Industry Association (SNIA) defines a backup as a “collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible; also called a backup copy.” For data retention, an application or program creates an archive. SNIA defines an archive as “A collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data.” Although creating an archive may involve additional operations (e.g., indexing to facilitate searching, compressing, encrypting, etc.) and a backup can be writable while an archive may not be, the creation of both involves copying data from a source to a destination.
This copying to create a backup or an archive can be done differently. All of a defined set of data objects can be copied, regardless of whether they have been modified since the last backup to create a “full backup.” Backups can also be incremental. A system can limit copying to modified objects to create incremental backups, either a cumulative incremental backup or a differential incremental backup. SNIA defines a differential incremental backup as “a backup in which data objects modified since the last full backup or incremental backup are copied.” SNIA defines a cumulative incremental backup as a “backup in which all data objects modified since the last full backup are copied.”
A data management/protection strategy can use “snapshots,” which adds a point in time aspect to a backup. A more specific definition of a snapshot is a “fully usable copy of a defined collection of data that contains an image of the data as it appeared at a single instant in time.” In other words, a snapshot can be considered a backup at a particular time instant. Thus, the different techniques for creating a backup can include different techniques for creating a snapshot. The SNIA definition further elaborates that a snapshot is “considered to have logically occurred at that point in time, but implementations may perform part or all of the copy at other times (e.g., via database log replay or rollback) as long as the result is a consistent copy of the data as it appeared at that point in time. Implementations may restrict point in time copies to be read-only or may permit subsequent writes to the copy.”
An organization can use different backup strategies. A few backup strategies include a “periodic full” backup strategy and a “forever incremental” backup strategy. With the periodic full backup strategy, a backup application creates a full snapshot (“baseline snapshot”) periodically and creates incremental snapshots between the periodically created full snapshots. With the forever incremental backup strategy, a backup application creates an initial snapshot that is a full snapshot and creates incremental snapshots thereafter.
Data management/protection strategies increasingly rely on cloud service providers. A cloud service provider maintains equipment and software without burdening customers with the details. The cloud service provider provides an application programming interface (API) to customers. The API provides access to resources of the cloud service provider without visibility of those resources.