Various forms of network-based data storage systems are known today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc. A network-based storage system typically includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (“clients”).
In the context of NAS, a storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files. The files may be stored in one or more arrays of mass storage devices, such as magnetic or optical disks or tapes, by using RAID (Redundant Array of Inexpensive Disks). Hence, the mass storage devices in each array may be organized into one or more separate RAID groups.
In a SAN context, a storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain Filers made by Network Appliance, Inc. of Sunnyvale, Calif. (NetApp®).
A network storage system can be configured to operate as (or in) an HSM system (hierarchical storage management system). HSM is a data storage technique in which data is automatically moved between high-cost/high-performance storage media and low-cost/lower-performance storage media. HSM was developed mainly because high-speed storage devices, such as magnetic hard disk drives, are more expensive (per byte stored) than slower devices, such as optical disks and magnetic tape. While it would be preferable to have all data available on high-speed devices at all times, that can be too expensive for many organizations. Therefore, HSM systems instead store most of the data on slower (less-expensive) devices and then copy data to faster devices on an as-needed basis. An HSM system typically monitors how data is used and makes guesses as to which data can be safely moved to slower storage devices and which data should stay on the faster storage devices.
FIG. 1 shows an example of a configuration of an HSM system at a high level. A client 1 requires access to data stored in a primary storage facility 2. The primary storage facility 2 includes the faster storage devices in the HSM system, such as magnetic hard disks. The secondary storage facility 3 provides the slower storage devices in the system, such as optical disks or magnetic tape. Movement of data between the primary storage facility and the secondary storage facility is controlled by an HSM server. Data may be communicated between the primary and secondary storage facilities directly, through the HSM server, or both.
Some HSM systems provide “transparent” remote file access to archived data, which is also called “transparent recall”. This means that the data is fetched from the secondary storage facility and delivered to the requesting client without committing it to disk in the primary storage facility. This feature is particularly desirable when the client needs to read only a portion of a very large file, since otherwise, normal HSM practice would be to recall the entire file and store it in the primary storage facility.
However, transparent recall may not be practical in some HSM systems. For example, with a system such as shown in FIG. 1, the primary and secondary storage facilities often have very different architectures and protocols. These differences tend to make it difficult and/or expensive to implement true transparent recall.