Today, many (if not all) organizations tend to conduct substantial amounts of business electronically, and consequently, depend on having reliable, continuous access to information technology systems, applications, and resources in order to effectively manage business endeavors. At the same time, information technology threats ranging from viruses, malware, and data corruption to application failures and natural disasters are growing in number, type, and severity, while current trends in technology have presented information technology departments with a plethora of recurring challenges. For example, the need to do business at an increasingly faster pace with larger critical data volumes have amplified the pressure on information technology, which has led to efforts to consolidate, migrate, or virtualize servers and resources hosted thereon without disrupting operations or damaging resources. As such, even isolated failures have the potential to render information technology resources unavailable, which may cause organizations to lose substantial amounts of revenue or information that could impede or even cripple business. Although certain organizations have attempted to utilize backup solutions to protect the information that applications create, many backup solutions lack the restoration granularity required to quickly restore important data, while other backup solutions demand full restoration to temporary disk space, even to recover a single file. Moreover, tape backup systems impose additional time burdens to find and mount the correct tape before the recovery process can even begin. Consequently, many organizations have turned to complementary solutions, including virtual snapshots, replication and continuous data protection systems, and high availability technologies to minimize downtime and protect critical applications and data.
However, because many applications are using new and emerging technologies, replication and backup solutions tend to leave many gaps in how to manage these technologies, while implementing separate point solutions typically results in significant cost and complexity. For example, many organizations have increasingly been adopting UNIX and Linux operating system implementations, which typically use the Network File System (NFS) protocol to provide client computers with transparent and remote access to shared file systems on a server over a network to meet local storage needs, but in actuality client computers and servers jointly perform every NFS operation. In particular, the NFS protocol uses a supporting mount protocol to perform operating-system functions that allow a client computer to attach remote directory trees to a point within a local file system, and the mount process further allows the server to grant remote access privileges to restricted client computers via export controls. In response to the remote server exporting the mount point that the client computer uses to mount the remote server file system, the server then passes file handles that represent file objects in the file system to the client computer, which uses the file handles to communicate with the server and perform all subsequent operations on the file system. Although NFS can provide many advantages to manage file system operations due to the design independent from underlying machines, operating systems, network architectures, and transport protocols used therewith, the mechanisms that the NFS protocol uses to represent file objects in the file system do not easily lend themselves to replication and backup solutions.
For example, NFS file handles are data structures that uniquely identify file objects within the file system, but the NFS file handles are typically encoded and decoded on the server, meaning that the NFS file handles are opaque to client computers (i.e., only the server can correctly interpret the file handles). In many NFS implementations, the file handle contains an inode number to index information about the represented file object, an inode generation number to add a history to the inode number, a device identifier that indicates where the file object resides, and if configured, parent information associated with the file object. However, NFS was designed to be a stateless protocol, meaning that the server only uses the file handle (inode number) to operate on the represented file object, which does not provide sufficient information to enable replication and backup operations. For example, continuous data protection products typically have a master host track changes to metadata associated with protected file objects. The master host then writes the tracked metadata changes to a journal file and sends the journal file to a replica host, which applies the changes to a replica of the protected file objects to maintain consistency with the master host. Importantly, all changes written to the journal file record a full path associated with any file objects that have been changed relative to the protected directory because the master host and the replica host may store the protected file objects under different directories. As a result, in order to replicate changes to a file object maintained on a master NFS host to another remote host, a replication component on the master host must translate the file handle associated with the changed file object into a full path associated with the file object.
However, as noted above, NFS servers only interpret NFS file handles to identify the inode number that indexes information about the file object, wherein the server uses the file inode number (without the file name) to perform all file system operations and pass results from the file system operations to the client computer. Thus, because the file handle typically only contains limited information about the file object (i.e., the inode number, inode generation number, and device identifier), the master NFS host cannot obtain the full path that the file object has within the file system. More particularly, directory entries in disk-based file systems essentially contain the name and inode number associated therewith, which enables a virtual file system kernel component to use the parent inode number associated with a current path component to perform a forward path lookup (i.e., the current path component and the parent inode number may be used to traverse the directory and locate the inode associated with the file object). The virtual file system may then cache a relationship between the inode and the name associated with the file object to accelerate subsequent disk lookups (i.e., once a particular path component has been resolved, ancestor information has already been cached, such that the kernel in a disk-based file system can link cached components together in order to obtain the full path associated with a file object). In contrast, because NFS has a stateless design in which the server only uses file handles to perform file system operations, NFS-based file systems lack a cached translation between the name and inode number associated with the file objects that are stored therein because the file handles alone fail to provide sufficient information to build the full file path. Further detail noting difficulties to replicate file systems that implement NFS due to clients and servers potentially having different path name mappings and consequently inconsistent images of the file name space are described in “NFS: Network File System Version 3 Protocol Specification,” the contents of which are hereby incorporated by reference in their entirety.
Moreover, another issue that interferes with suitably performing replication and backup on file systems that implement the NFS protocol relates to “hard links,” which generally refer to different file objects within the file system having identical inode numbers but potentially different parents or file names. In other words, file systems implemented on NFS may contain hard links that represent multiple entry points to the same data, whereby an operation that applies changes or modifications to one hard linked file object may result cause the changes or modifications to be unintentionally imputed to the other hard linked file objects. In replication contexts, all hard links to the same file object must therefore be protected in one replication scenario because replicating the hard links in separate synchronization or replication scenarios would result in the hard links becoming normal file objects such that the multiple entry points to the same data would be lost. Furthermore, any subsequent changes to the previously hard linked data cannot be captured and automatically applied to any hard links that were not synchronized in the replication scenario (i.e., once the replication scenario begins, the master host would be unable to create the hard link between cross root directories). Moreover, NFS permits hard links on file objects that represent directories within the file system, which can potentially result in a hard link between different root directories in different file systems. Thus, the hard link problem interferes with suitable replication and backup because hard linked directories would lead to inconsistent parent directory entries and hard links undermine the operating system independence associated with the NFS protocol because most operating systems lack support for the notion of a hard link, among other reasons.