With the advent of globalization, networked services have a global audience, both in the consumer and enterprise spaces. For example, a large corporation today may have branch offices at dozens of cities around the globe. In such a setting, the corporation's information technology (IT) administrators and network planners face a dilemma. On the one hand, they could centralize or concentrate the servers that power the corporation's IT services (such as e-mail and file servers) at one or a small number of locations. This would keep administration costs low but may drive up network costs and also hurt performance, because what would have normally been local-area network (LAN) traffic becomes wide-area network (WAN) traffic. On the other hand, the servers and services could be distributed to be closer to clients. However, this would likely drive up the complexity and cost of developing and administering the services.
Having both would be ideal, specifically, having the operational benefits of centralization along with the performance benefits of distribution. In recent years, protocol-independent redundancy elimination (RE) has emerged as a powerful technique to help bridge the gap by making WAN communication more efficient through elimination of redundancy in traffic. Such compression is typically applied at the internet protocol (IP) or transmission control protocol (TCP) layer. For example, this compression can use a pair of middleboxes placed at either end of a WAN link connecting a corporation's data center and a branch office. Each box stores the payload from any flow traversing the link between them in a cache, irrespective of the application or protocol. When one box detects chunks of data that match entries in its cache (by computing “fingerprints” of incoming data and matching them against cached data), it encodes the matched data with tokens. The box at the far end reconstructs the original data using its own cache and the encoded tokens. Recently, this approach has seen increasing commercial deployment as part of a suite of optimizations in middleboxes called WAN optimizers. In fact, many enterprises today use WAN optimizers that are deployed across WAN access links to eliminate redundancy in network traffic and reduce WAN access costs.
These middlebox-based solutions, however, have two key drawbacks that impact their overall usefulness in the long term. First, with the standardization of secure transmission protocols, there is a growing shift toward end-to-end encryption of data. Unfortunately, middleboxes do not cope well with traffic encrypted end-to-end and many leave such data uncompressed. A small fraction of middleboxes employ tricks (such as connection termination and sharing of encryption keys) to accommodate secured socket layer (SSL) and secure shell (SSH) traffic, but these weaken end-to-end semantics of enterprise transactions considerably. Second, in-network middleboxes do nothing to improve performance over last-hop links of mobile and wireless devices, and these devices are beginning to overrun the enterprise workplace. If end-to-end encryption does become ubiquitous, and the adoption of resource constrained mobile and wireless devices continues its upward trend, then RE will eventually be forced out of middleboxes and directly into end host stacks.