According to recent trends, hierarchical storage systems (also referred to as tiered storage systems) are being used to optimize storage resources and reduce the cost of data storage. Hierarchical or tiered storage is a solution for reducing the cost of storing data by differentiating various types of data, and then storing the data in storage devices that are selected to provide an appropriate level of reliability and performance. For example, a hierarchical storage system may include plural storage tiers such as a high reliability, high performance, and premium cost first tier that may be used for important data that is accessed often, and a lower reliability, lower performance, and less expensive second tier that may be used for archive data or other infrequently-accessed data. Data can be stored according to a classified type, particular owner, level of service agreement, or the like, and also may be migrated between tiers based on various situations and contingencies. In some applications, three or more tiers of storage may be provided for even greater efficiency of storage resource use. Thus, by using these various levels of tiered storage resources, the total cost of storage can be reduced, while required access speed or reliability for specific data can still be maintained.
The storage hierarchy in a tiered storage system can be made up of Network Attached Storage (NAS) and/or Contents Addressable Storage (CAS). A NAS is typically a file-based storage system accessible over a network via a file-based protocol, such as NFS (Network File System) protocol or CIFS (Common Internet File System) protocol, and may include a NAS head that manages input/output (I/O) operations received from users, and a storage portion including storage mediums that store the actual data. A CAS is special purpose storage that may be used for online archiving, and which may use an address in the storage system based upon the content of the data being stored. Regulatory compliance requirements have resulted in the usage of CAS for archiving, and the like. For example, files or objects in a CAS system typically may be stored without any update for some specified period of time referred to as a “retention period”. There are two conventional file access methods for CAS, one of which uses general network file system protocols such as NFS protocol or CIFS protocol. The other conventional file access method for CAS involves using a content ID as a storage address calculated according to the file name of a file or the content of the file. The present invention is not limited to a particular access method.
Storage hierarchies (also referred storage tiers) are typically created for several reasons such as optimizing storage cost or because of a functional difference between tiers, such as performance and/or reliability. For example, data which is expected to be infrequently accessed can be stored in a lower cost and lower performance storage tier. In such a situation, a first tier of storage (e.g., “tier 1”) may be a high performance NAS (typically having a lower capacity), and a second tier of storage (e.g., “tier2”) may be a standard performance NAS (typically having a larger capacity). Additionally, data which does not have to be accessed fast, but is needed to be stored as a write once feature for the compliance reasons some days after its generation should be moved into an archive storage such as a CAS (e.g., a second tier), even if the data was generated on a NAS (e.g., a first tier).
HSM (Hierarchical Storage Management) is software implemented to manage storage tiers and move or migrate data between tiers. Some implementations of HSM realize client-transparent data migration using what are known as stub files or “stubs”. A stub is a data file that stands in for the original file, and is similar to a shortcut in the Windows® operating system or a symbolic link in the Unix® operating system. In some implementations, HSM software constantly monitors storage medium capacity and moves data from one storage level to the next based on age, category and other criteria as specified by the network or system administrator. HSM often includes a system for routine backup as well. For example, when a file is migrated from a first storage tier to a second storage tier, the migrated file is replaced with a small stub file that indicates where the migrated file was moved to. Thus, after moving the migrated file from an upper tier to a lower tier, for example, a small stub file is placed at the same location (i.e., same path name or address) that was previously occupied by the migrated file. Embodiments of the present invention are related to the HSM system implementing the stub file mechanism.
Another technology that is widely used is remote replication of data from one storage system to another storage system. Remote replication is often used to enable recovery of data following a disaster, or the like. It would be desirable to use remote replication with a HSM system to make disaster recovery possible. However, when current remote replication technology is combined with HSM technology, numerous problems can arise which prevent proper operation, such as link information corruption, tier configuration corruption, or the like.
Related art includes US Pat. Appl. Pub. No. 2007/0239803 to Mimatsu, US Pat. Appl. Pub. No. 2007/0266059 to Kitamura, and US Pat. Appl. Pub. No. 2007/0192551 to Hara et al., the entire disclosures of which are incorporated herein by reference.