1. Field of Invention
The present invention relates generally to the field of file sharing. More specifically, the present invention is related to remote file system caching.
2. Discussion of Prior Art
In a Storage Area Network (SAN), a SAN file system is used to provide high-performance access to large file systems. A SAN file system is a type of cluster file system in which both clients and servers are connected to a SAN. Sometimes client and server roles are combined symmetrically in a single machine, as in the case of GPFS, and sometimes these functional roles are partitioned onto distinct machines, as in the case of Storage Tank (ST). Two key characteristics of SANs affect systems in which all participants are required to have SAN access. First, a SAN has a restricted geographical extent, thus limiting it to a relatively small campus. Second, participating hosts must be trusted not to corrupt file system data.
However, users and applications are often distributed over a large area or are contained within administrative domains that do not fully trust each other. In these situations, a bridge is needed to allow file sharing across Wide Area Networks (WAN), administrative domains, or both. The two ends of the bridge, namely, the import and export sides, generally have different characteristics. The bridge can be constructed at the file-level above a SAN file system or at the block-level below it. Issues requiring consideration include security, performance, and administration. A WAN admits a much larger range of trust and has different threat models than a SAN. Therefore, strong security and a flexible approach to integration and interface with multiple domains are necessary to ensure data availability and authenticity. Additional provisions are necessary to account for performance parameters unique to a WAN, in specific, higher latencies and limited bandwidth.
Issues associated with exporting data from a SAN file system can be addressed by using an existing distributed file system protocol. For example, a node of a General Parallel File System (GPFS) could run a Network File System (NFS) or Samba server, which could, in turn, deliver SAN file system data to remote users. These protocols are widely implemented, but are not well adapted to a WAN environment. A more suitable protocol, such as Andrew File System (AFS) or Distributed File System (DFS), could be used. While a DFS server has been integrated, albeit in a limited way, with GPFS on AIX, it is not available for any other SAN file systems, nor has AFS been ported to such an environment.
A complementary task is importing remote data into a SAN file system so that local applications have access to it. Machines acting as clients of a SAN file system (e.g., ST clients or GPFS nodes) can individually run distributed file system clients, such as NFS or AFS, but forgo many of the benefits of a SAN file system. Features missing from these discrete solutions include centralized administration and management, uniform namespace, shared caching, and reduced WAN usage.
Another approach to circumventing SAN distance limitations is to use an IP-compatible storage protocol such as Internet Small Computer System Interface (iSCSI). This gives a SAN file system the capability to expand geographically and include constituents distributed over a WAN. Issues associated with this approach are generally due to performance and scaling. Software for storage systems designed to benefit from low latencies of a SAN may require redesign when faced with much larger WAN latencies and an expanded set of more complex failure modes. A SAN file system that spans a large geographical area will generally have more network constituents, which can challenge the scaling abilities of a file system's architecture. Specifically, cluster algorithms controlling file system metadata consistency may behave poorly when the number of network constituents increases. Additionally, distributed network constituents may necessitate the use of security model that is not as restricted as that of a SAN.
Therefore, there is a need in the art to address data importation from a remote source and data exportation to a remote source. There are at least two problems with performing these tasks within the context of a file server, specifically, within the same process. First, transferring file data imposes large network and disk loads on the server, which can severely impact its performance on other tasks. Second, supporting multiple protocols necessary to communicate with a variety of remote sources can lead to software compatibility issues, maintenance problems, and large executable footprints.
Existing strategies that address some aspects of this concern by decoupling metadata handling from the burden of transferring file contents include File Transfer Protocol (FTP), SAN third-party backup, and HSM using DMAPI. FTP transfers data using distinct control and data channels. Thus, a separate agent operating at the file-level handles file contents. However, because FTP operates at the file-level, it is limited in that it cannot support decoupled metadata and file contents in a SAN File System.
SAN-based backup systems, such as Veritas™, Pathlight™, and Legato/Celestra™, alleviate server load by providing server-less backup. These systems utilize a SAN environment to copy data between storage devices, such as disk to tape, using a block-level interface such as Small Computer System Interface (SCSI). They are limited, however, in that they do not provide for file system interaction and thus, do not provide file-level access to data residing on either a SAN or across a WAN.
Hierarchical storage management systems gained additional portability and interchangeability with the development of the Data Management API (DMAPI). This interface allows the kernel to intercept references to selected portions of a file system and pass them to a user space DM application. Access to data is allowed at the file-level through the file system, so it does not support a remote DM application nor does it take advantage of a SAN environment to offload file server machines.
The present invention has wide applicability since it makes limited assumptions about the architecture of a SAN file system and does not make extreme performance demands on a WAN.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.