A primary copy of data is generally a production copy or other “live” version of the data which is used by a software application and is generally in the native format of that application. Primary copy data may be maintained in a local memory or other high-speed storage device that allows for relatively fast data access if necessary. Such primary copy data is typically intended for short term retention (e.g., several hours or days) before some or all of the data is stored as one or more secondary copies, for example, to prevent loss of data in the event a problem occurred with the data stored in primary storage.
To protect primary copy data or for other purposes, such as regulatory compliance, secondary copies (alternatively referred to as “data protection copies”) can be made. Examples of secondary copies include a backup copy, a snapshot copy, a hierarchical storage management (“HSM”) copy, an archive copy, and other types of copies.
A backup copy is generally a point-in-time copy of the primary copy data stored in a backup format as opposed to in native application format. For example, a backup copy may be stored in a backup format that is optimized for compression and efficient long-term storage. Backup copies generally have relatively long retention periods and may be stored on media with slower retrieval times than other types of secondary copies and media. In some cases, backup copies may be stored at an offsite location.
After an initial, full backup of a data set is performed, periodic, intermittent, or continuous incremental backup operations may be subsequently performed on the data set. Each incremental backup operation copies only the primary copy data that has changed since the last full or incremental backup of the data set was performed. In this way, even if the entire set of primary copy data that is backed up is large, the amount of data that must be transferred during each incremental backup operation may be significantly smaller, since only the changed data needs to be transferred to secondary storage. Combined, one or more full backup and subsequent incremental copies may be utilized together to periodically or intermittently create a synthetic full backup copy. More details regarding synthetic storage operations are found in commonly-assigned U.S. patent application Ser. No. 12/510,059, entitled “Snapshot Storage and Management System with Indexing and User Interface,” filed Jul. 27, 2009, now U.S. Pat. No. 7,873,806, which is hereby incorporated by reference herein in its entirety.
An archive copy is generally a copy of the primary copy data, but typically includes only a subset of the primary copy data that meets certain criteria and is usually stored in a format other than the native application format. For example, an archive copy might include only that data from the primary copy that is larger than a given size threshold or older than a given age threshold and that is stored in a backup format. Often, archive data is removed from the primary copy, and a stub is stored in the primary copy to indicate its new location. When a user requests access to the archive data that has been removed or migrated, systems use the stub to locate the data and often make recovery of the data appear transparent, even though the archive data may be stored at a location different from the remaining primary copy data.
Archive copies are typically created and tracked independently of other secondary copies, such as other backup copies. For example, to create a backup copy, the data storage system transfers a secondary copy of primary copy data to secondary storage and tracks the backup copy using a backup index separate from the archive index. To create an archive copy, a conventional data storage system transfers the primary copy data to be archived to secondary storage to create an archive copy, replaces the primary copy data with a stub, and tracks the archive copy using an archive index. Accordingly, the data storage system will transfer two separate times to secondary storage a primary copy data object that is both archived and backed-up.
Users often need to access files in a secondary or a backup storage with a specific search context in their mind. For example, a user may need to access photo files from his last Hawaii trip, archived on a storage system, or all documents that include the word “taxes” and so on. Such a context-sensitive search is cumbersome using presently available techniques in which a user has to speculatively mount archived file folders to the user's computer and then sift through all files in the mounted drive to look for files of interest.
In other operational scenarios, a user may want to access a specific portion of an archived media file, such as a home video. Alternatively, a user may want to access the archived home video starting at a specific point in the video file. A user may experience long delays in fulfilling such requests using conventional techniques in which mounting of the video file may take a significant amount of time. Furthermore, such user activities may tie up valuable computational resources needed for mounting files from archives to a local memory and transferring data between the user device and a secondary storage location.
The need exists for systems and methods that overcome the above problems, as well as systems and methods that provide additional benefits. Overall, the examples herein of some prior or related systems and methods and their associated limitations are intended to be illustrative and not exclusive. Other limitations of existing or prior systems and methods will become apparent to those of skill in the art upon reading the following detailed description.