The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Geographically dispersed enterprises often deploy distributed computer systems to enable information sharing throughout the enterprise. Such distributed systems generally comprise an enterprise network, which includes a number of Local Area Networks (LANs) that are connected over one or more Wide Area Network (WAN) communication links. An enterprise network generally includes one or more servers that store enterprise information in one or more data resources. The servers supply the data from the resources upon receiving requests for the data from other enterprise servers or clients, which servers or clients may be established in the LAN that is local to the data resource or may be established in a LAN that is located across the WAN.
For example, the business structure of an enterprise may comprise a main office, and one or more branch offices. To support this business structure, the enterprise typically employs a local LAN for each of the main and branch offices, and one or more WAN communication links that connect the LANs at the branch offices with the LAN at the main office. This network infrastructure enables users at the branch offices, who run software applications locally on their workstations, to access files that are located at the main office.
While this network infrastructure allows for greater sharing of information for users throughout the enterprise, it also has a significant disadvantage because software applications that access data resources are primarily designed to access the data resources over a relatively high-speed LAN. Usually, significant latency and performance degradation are observed when a software application accesses a data resource that is located across the WAN in a remote LAN. In a typical example, an enterprise user in a branch office uses a word-processing application to access and modify files. Usually, operations on files that are in the LAN local to the user are relatively quick, while operations on files that are located across the WAN are relatively slow and sometimes unreliable.
One of the reasons for the above-described performance degradation is the limited or insufficient bandwidth of the WAN link across which the data resources are accessed. When a client application needs to access a data resource such as a file, the client application usually first reads the file over the WAN link from a server on the remote LAN, modifies the file according to changes initiated by a user, and then writes back the file over the WAN link to the server on the remote LAN. Thus, the client application effectively transfers the file over the WAN link twice—once when it first reads the file and once when the modified file is written back. Since the WAN link usually has low or limited bandwidth, the latency in response times for the users of the client applications is relatively high. Furthermore, since the WAN link is usually shared between client applications, client applications that are interactive and read/write intensive tend to monopolize the use of the WAN link and to starve other less aggressive network applications for bandwidth.
The above problems caused by insufficient bandwidth are not unique to WAN links. Similar problems of high latency and high response times are observed by client applications that transfer data over other low bandwidth communication links, such as, for example, dial-up connections, Digital Subscriber Line (DSL) connections, and Integrated Services Digital Network (ISDN) connections.
A general scheme to address the negative effects caused by insufficient bandwidth is to limit the amount of redundant data transferred over a low bandwidth communication link. One past approach to limit redundant data traffic over low bandwidth communication links is described by Muthitacharoen et al., in “A Low-Bandwidth Network File System”, published in October 2001. Muthitacharoen et al. describes a low bandwidth file system (LBFS) that is specifically designed to operate over low bandwidth communication links.
However, the LBFS approach of Muthitacharoen et al. has numerous disadvantages. One disadvantage is that it can only reduce network traffic that pertains to accessing files managed through an LBFS file system. Thus, the LBFS approach is inapplicable as a general solution for reducing any network traffic over a communication link that may or may not have low bandwidth.
Another disadvantage is that the LBFS approach requires custom protocol semantics for accessing files managed through an LBFS server. For example, the LBFS approach provides its own READ and WRITE functions to read from and write to files that are accessed through a LBFS server. However, these custom protocol semantics are not compatible with any existing and widely deployed file systems, such as File Allocation Table (FAT) file system, Windows NT File System (NTFS), eXtended File System (XFS), UNIX File System (UFS), Veritas File System (VxFS), and other Unix-type file systems. For example, the LBFS approach requires special software drivers, or an LBFS client, to be installed on any computer system that needs to access files across a low bandwidth communication link, and also requires that an LBFS server be installed on the computer system that physically stores the files. Since the LBFS approach is not compatible with existing file systems, it is not suitable for large-scale deployment in a network environment that uses such existing file systems.
Yet another disadvantage of the LBFS approach is that it does not support on-line, or interactive, access to the files stored across a low bandwidth communication link. The LBFS approach transmits data across the communication link only on OPEN file and CLOSE file commands that are issued from an application using a file. For example, a file is transmitted to a LBFS client when an application utilizing the client requests to open the file. The file is then modified by the application, but changes to the file are transmitted to the LBFS server by the LBFS client only after the application has requested that the file be closed. This significantly limits the ability of applications to concurrently share files accessed through a LBFS server, and consequently the LBFS approach lacks the scalability for deployment in the networks of large and geographically dispersed enterprises.
Based on the foregoing, there is a clear need for techniques for reducing network traffic, which techniques are scalable, applicable to any type of network traffic over any type of communication links, and which support the protocol semantics of existing file systems.