1. Field
Embodiments of the invention relate to data migration and, in particular, to systems and methods for managing access to primary or migrated data in a clustered file system environment.
2. Description of the Related Art
Current information management systems employ a number of different methods to perform storage operations on electronic data. For example, data can be stored in primary storage as a primary copy or in secondary storage as various types of secondary copies (e.g., backup copies, archive copies, hierarchical storage management (“HSM”) copies), which are typically intended for long-term retention before some or all the data is moved to other storage or discarded.
In certain storage systems, when the data of a file is moved from primary to secondary storage, the file in primary storage is replaced with a stub file that indicates the new location of the migrated data on secondary storage. In certain examples, the stub comprises a relatively small, truncated file (e.g., several kilobytes) having the same name as the original file. The stub file can also include metadata that identifies the file as a stub and that can be used by the storage system to locate and restore the migrated data to primary storage. This stubbing process is generally performed transparently to the user by a storage service and file system driver.
Reading each file following a file system operation (e.g., read, write, rename request) to identify if the file is a stub or an actual file can be unduly time-consuming. As a result, certain stand-alone file systems can utilize an index or cache to record whether or not a recently-accessed file is a stub. However, such a configuration becomes unworkable in a clustered file system as the same file can be independently accessed and modified (e.g., migrated) by any one of various cluster nodes. That is, in the cluster configuration, caching of file/stub information becomes more cumbersome because a file system driver associated with one node of the cluster does not necessarily control or monitor all I/O paths to the stored files' data and does not know when a file has been modified or migrated by another node.