As the Internet has matured, the nature and characteristics of the content available over the Internet has changed. In particular, the content stored by users over the Internet has increased in complexity. In addition to simply storing data such as text, images, video, audio, and the like, it has become more and more common to store structured data over the Internet. Structured data refers to data that has been organized in accordance with a schema. As used herein, a “schema” generally comprises a set of rules that define how data is to be organized. The schema provides structure and context to the underlying data. Schemas vary depending on the type of data they are intended to organize, e.g., an email-inbox-related schema organizes data differently from a schema that organizes a user's favorite websites, which organizes data differently from a schema that organizes a photo album.
The loss of a user's structured data can have an impact not only on that individual user but also on other users and other applications that rely on the data. Accordingly, it is increasingly important to back up such data in the event it becomes lost, inadvertently deleted, or corrupted.
Today many backup systems operate by backing up data files stored on a computer network, file by file, to a long term storage medium, such as a tape backup system. The traditional process of backing up data to tape media is time driven and time dependent. That is, a backup process typically is run at regular intervals and covers a certain period of time. For example, a full system backup may be run once a week on a weekend, and incremental backups may be run every weekday during an overnight backup window that starts after the close of business and ends before the next business day.
These individual backups are then saved for a predetermined period of time, according to a retention policy. In order to conserve tape media and storage space, older backups are gradually faded out and replaced by newer backups. Further to the above example, after a full weekly backup is completed, the daily incremental backups for the preceding week may be discarded, and each weekly backup may be maintained for a few months, to be replaced by monthly backups. The daily backups are typically not all discarded on the same day. Instead, the Monday backup set is overwritten on Monday, the Tuesday backup set is overwritten on Tuesday, and so on. This ensures that a backup set is available that is within eight business hours of any corruption that may have occurred in the past week.
Despite frequent hardware failures and the need for ongoing maintenance and tuning, the backup creation process can be automated, while restoring data from a backup remains a manual and time-critical process. First, the appropriate backup tapes need to be located, including the latest full backup and any incremental backups made since the last full backup. In the event that only a partial restoration is required, locating the appropriate backup tape can take just as long.
In general, structured data is stored in relational databases and backups are created for the databases in their entirety rather than for each user's subset of structured data individually. As a result, if only a single user's structured data needs to be restored from the backup, the backup tapes for the entire database need to be located and restored onto a secondary staging system. From that system, the requisite structured data needs to be manually extracted and written to the primary store. Thus, if a portion of the data is lost, it is often difficult to restore just the data that was lost, and often the system administrator is forced to decide whether it is worth the cost of retrieving the lost portion of the data.
This Background is provided to introduce a brief context for the Summary and Detailed Description that follow. This Background is not intended to be an aid in determining the scope of the claimed subject matter nor be viewed as limiting the claimed subject matter to implementations that solve any or all of the disadvantages or problems presented above.