Local Area Network (LAN) communication is characterized by generous bandwidths, low latencies and considerable enterprise control over the network. By contrast, Wide Area Networks (WANs) often have lower bandwidths and higher latencies than LANs and often have a measure of network control that is outside the enterprise for which the WAN is being used. In large distributed enterprises, WANs thus pose a performance bottleneck especially when users in distributed offices attempt to access data or applications that are run from centralized data centers. For example, retrieving electronic email (“e-mail”) over the WAN from a mail server in a centralized data center can involve a lengthy data transfer that can interfere with an end user's productivity. In contrast, retrieving e-mail from a local mail server across a LAN provides virtually instantaneous performance to an end-user. Similarly, fetching Web pages from a Web server or files from a file server across a WAN can be difficult in terms of performance compared to fetching such data from a local server across a LAN.
To generalize, users will often need to run applications that are designed with acceptable performance for a particular network configuration, but have to run those applications over a network configuration that has a much lower performance. As a common example is a LAN-based application that must accommodate a WAN, that example is used herein in several places.
Several approaches to overcoming the network performance for the lower performance network that is to handle data for applications designed with higher performance networks in mind. However, most solutions are unsatisfactory in one way or another.
One approach is to replicate servers and deploy systems that automatically mirror or replicate data from origin servers in data centers to replicated servers in distributed locations to, in effect, move copies of the data closer to clients. The replicated servers would then have copies (a mirror) of the data from the origin server, but the replicated servers would be closed to the clients they server than the origin server. Clients would access data from their local replicated server to achieve better performance, since the data would be “closer” in a network sense. This approach suffers from the complexity and expense of deploying duplicate servers and managing the flow and synchronization of data from the origin servers to the replicated servers. With this approach, it is in difficult to predict what data is needed where and when, so the implementation often just duplicates all available data in each location.
Another approach that has been used with Web content and streaming media is to deploy proxy cache devices at distributed locations to enhance the access performance to data that is retrieved at a given location more than once. In such an arrangement with LANs/WANs, caching proxies are situated on LANs near clients. A caching proxy would act as an intermediary between its set of clients and servers that are accessed across a WAN. A caching proxy stores previously transmitted data in the hope that the cached data will be requested sometime in the future. When a client requests data from a Web server, for example, that client's Web connection is intercepted by the proxy cache. If the proxy cache has the requested data, it simply serves the data locally across the LAN. If it does not have the requested data, it retrieves the requested data from the server across the WAN, transmits the data to the requesting client, and stores the retrieved data in its cache, indexed by its uniform resource locator (URL) in hopes that it would be reused for a later request.
In this fashion, data accessed multiple times suffers the performance bottleneck of the WAN only on the first client request, then enjoys the performance benefit of the LAN for all subsequent accesses. However, for data that is only accessed once, there is no performance benefit. Other techniques are used to improve performance for the first client request for data (that is subsequently requested again or is only requested once). For example, network caching systems have been augmented with content delivery capabilities whereby operators can move desired content into the proxy caches before it is requested. In this model, a content publishing system usually interfaces with a content delivery system to allow an operator to publish content to the set of proxy caching servers. Thus, presuming a certain piece of data has been pre-loaded into a proxy cache in this fashion, the first client request for that data will experience high performance. However, such systems are generally complex to create and administer, and often require new business process to be deployed to support this mode of information delivery. Also, relying upon user configuration to place content appropriately is generally expensive, sub-optimal, and prone to error.
Yet another approach to solving the WAN bottleneck is to distribute servers so that servers for a portion of an enterprise are located near clients for that portion of the enterprise. For example, an enterprise with several branch offices might locate an e-mail server, a file server, etc., in each branch office and store a given user's data on the servers in that user's branch office. For instance, when an e-mail message arrives at the enterprise's main mail gateway for a particular user, the mail gateway will identify the e-mail server for the branch office of the particular user and route the e-mail message to the identified server. When the user retrieves their e-mail, it is fetched from the local office's e-mail server and performance is high. Likewise, a user located in a particular office would store and retrieve files from that office's file server, thereby also achieving high performance.
This approach is not always desirable, however, because of the high cost of managing and distributing servers across many locations in a large enterprise. Each such device must be managed, backed up, serviced, and so forth. It is often far less expensive and more desirable to manage as many servers as possible within a centralized data center. Yet, a centralized architecture requires that servers be accessed over the WAN, which as described above, can cause difficult performance problems.
Authentication and security mechanisms might further complicate many of these approaches. Agents that move content, for example, from an origin server to a replicated server must be completely trusted as such an agent has complete access to all data. Entrusting third party devices or software with “super user” access to everyone's data in an enterprise is a deployment barrier in many customer environments.
Therefore, improved techniques for handling data over networks is needed.