Often, organizations with remote locations may need to replicate critical data, such as engineering applications and libraries, to different locations. In order to make such critical data available to users in those remote locations without incurring network delays, the organizations may consume substantial resources (such as, e.g., file systems executing on file servers) managing a complex replication infrastructure and process. Data replication is a known technique that enables distributed online access to generally read-only data sets. Traditional data replication may rely heavily on file system mirroring to create entire read-only copies of data sets on distributed servers.
The mirrors generated by file system mirroring typically require a large amount of administrative overhead. For example, an administrator must determine what data needs to be replicated, as well as manage physical resources (file systems, files servers, etc.) for each mirror. As data sets grow, this type of data replication becomes increasingly impractical. In addition, the replication infrastructure may require the presence of servers in remote locations to store the replicated data, thus preventing organizations from consolidating their server infrastructures to a central location. Therefore, there remains a need to eliminate this expensive replication infrastructure and process without losing the benefit of immediate access to critical data.
One alternative to data replication mirroring is proxy caching. Proxy caching systems are typically employed to transparently replicate data sets on demand. A typical proxy cache system includes a front-end storage system or “proxy device” having local storage, i.e., a “cache”, coupled to a back-end storage system or “origin server” having remote storage. When a client request cannot be satisfied by the cache, it is proxied to the origin server. The server response is, in turn, proxied back to the requesting client and all associated data is cached in the local storage. This type of transaction is called a “cache miss”. Cache misses typically result in the data, such as file system data, being “filled” into the cache. When the data required to satisfy a client request is available in the cache, the proxy device may construct and send a response without communicating with its associated server. Such a transaction is called a “cache hit”. Using cache miss transactions, a proxy device allows clients to modify the state of a file system on the device. In contrast to standard replicas, this enables automatic replication without constraining clients to read-only access.
A conventional proxy caching solution provides the ability to distribute data, e.g., files, to remote locations without the need for continuous hands-on administrative management. An example of such a proxy caching solution is described in U.S. patent application Ser. No. 10/245,798 titled Apparatus and Method for a Proxy Cache, by E. Ackaouy, now issued as U.S. Pat. No. 7,284,030 on Oct. 6, 2007 and assigned to Network Appliance, Inc., Sunnyvale, Calif. A proxy storage system or appliance having a cache is coupled to a server storage system. A file system manages a set of files served by the proxy appliance; these files are accessed by clients using a file system protocol, such as the Network File System (NFS) and/or Common Internet File System (CIFS) protocol. In response, the proxy appliance serves the files using a file index hashing scheme based on file handles.
Broadly stated, the proxy appliance “listens” for a NFS/CIFS data access request issued by a client and determines whether it can serve that request locally using the hashing scheme. To that end, the proxy appliance converts the client request to a unique caching name before forwarding to its file system for a caching decision. A hashing function performed on the file handle produces the caching name, which is used by the file system to obtain a cache file or object store identifier to determine if the file is resident in the cache. If the file is resident in the cache, a determination is made as to whether all of the data that is requested by the client is resident in the cache. If not, the appliance proxies the request over to the server. When the server responds with the requested data or acknowledgement, the appliance passes the server response to the client. The proxy appliance also “fills” its cache with the server response to ensure that subsequent client requests may be served by the appliance.
The present invention is directed, in part, to an improved caching system that enables multi-protocol access by clients to data served by the system. In addition, the present invention is directed, in part, to an improved caching system that enables efficient client access to data served by the system using file system data structures and names. Moreover, the present invention is directed, in part, to an improved caching system that provides storage virtualization of data served by the system in response to multi-protocol data access requests issued by clients. In this context, storage virtualization denotes presenting a transparent view of storage to a client that involves cooperating storage resources from multiple storage systems, typically across a network.