While workers can easily share gigabytes of project data on a local-area network (LAN) using standard file-server technology, such is not the case with workers in remote offices connected over wide-area networks (WANs). With respect to file sharing over WANs, standard file server protocols provide unacceptably slow response times when opening and writing files.
All major file-sharing protocols were designed for LAN environments where clients and servers are located in the same building or campus, including: NFS (Network File System, used for Unix/Linux environments), CIFS (Common Internet File System used for Windows environments), and IPX/SPX (Internetwork Packet Exchange/Sequenced Packet Exchange, used for Novell environments). The assumption that the client and the server would be in close proximity led to a number of design decisions that do not scale across WANs. For example, these file sharing protocols tend to be rather “chatty”, insofar as they send many remote procedure calls (RPCs) across the network to perform operations.
For certain operations on a file system using the NFS protocol (such as an rsync of a source code tree), almost 80% of the RPCs sent across the network can be access RPCs, while the actual read and write RPCs typically comprise only 8-10% of the RPCs. Thus 80% of the work done by the protocol is simply spent trying to determine if the NFS client has the proper permissions to access a particular file on the NFS server, rather than actually moving data. In a LAN environment, these RPCs do not degrade performance significantly given the usual abundance of bandwidth, but they do in WANs, because of their high latency. Furthermore, because data movement RPCs make up such a small percentage of the communications, increasing network bandwidth will not help to alleviate the performance problem in WANs.
Therefore, systems have been developed (called wide area file services (WAFS)) which combine distributed file systems with caching technology to allow real-time, read-write access to shared file storage from any location, including locations connected across WANs, while also providing interoperability with standard file sharing protocols such as NFS and CIFS.
WAFS systems typically include edge file gateway (EFG) appliances (or servers), which are placed at multiple remote offices, and one or more file server appliances, at a central office or remote data center relative to the EFG appliance, that allow storage resources to be accessed by the EFG appliances. Each EFG appliance appears as a local fileserver to office users at the respective remote offices. Together, the EFG appliances and file server appliance implement a distributed file system and communicate using a WAN-optimized protocol. This protocol is translated back and forth to NFS and CIFS at either end, to communicate with the user applications and the remote storage.
The WAN-optimized protocol typically may include file-aware differencing technology, data compression, streaming, and other technologies designed to enhance performance and efficiency in moving data across the WAN. File-aware differencing technology detects which parts of a file have changed and only moves those parts across the WAN. Furthermore, if pieces of a file have been rearranged, only offset information will be sent, rather than the data itself.
In WAFS systems, performance during “read” operations is usually governed by the ability of the EFG appliance to cache files and the ability to serve cached data to users while minimizing the overhead of expensive kernel-user communication and context switches, in effect enabling the cache to act just like a high-performance file server. Typically, the cache attempts to mirror the remote data center, so that “read” requests will be satisfied from the local cache with only a few WAN round trips required to check credentials and availability of file updates.
Many software applications, such as word processing or spreadsheet applications, handle file save and close operations for documents in a common manner. In particular, many software applications never overwrite the original files that are being edited. Instead they rename the original file as a backup copy and create a new file for the document that is being created. For example, when a user opens a given file (here, A.doc) using a word processing application, a series of operations may result. The word processing application may first create a temporary file, such as ˜$xxx.doc, where xxx is based on the file name. After a user edits and saves the file, a new temporary file (e.g., ˜WRDxxx.tmp, where xxx is a random value) is created. The application writes the new contents of the file to this newly created temporary file, renames the original file (e.g., A.doc) to another temporary file name type (e.g., WRLyyy.tmp, where yyy is another random value), and renames the temporary file with the updated data (e.g., WRDxxx.tmp) to the original file name (A.doc). The second temporary file containing the previous version of the file is also deleted.
As discussed above, in a WAFS system, an edge and core appliance is disposed between a client hosting the software application and the file server that hosts the data file. The operation of the WAFS system changes the manner in which the file operations are executed. For example, the file system operations discussed above are performed on a version of the file cached at the edge appliance. In some WAFS systems, the edge appliance passes meta data operations (rename, delete, create, etc.), but not the actual data, through to the core appliance, which performs operations on the remote file server. As a result, temporary files with no data are created on the remote file server. For example, in the example discussed above, a save operation would create a WRDxxx.tmp file on the remote file server with zero bytes. In addition, the subsequent rename operation that also passes through to the core appliance therefore causes A.doc, while open at the remote client, to appear as a file with no data on the remote file server.
While the condition is not a concern for remote users at network locations served by the edge appliance, it is a concern for other users who during this time are not able to access the contents of the file. Furthermore, this condition results in certain inefficiencies during a file flush operation. When the file flush occurs there is no data in the A.doc on the remote file server with which to leverage differencing algorithms to reduce the amount of data transmitted to the core appliance.