This invention generally relates to communication networks, and more specifically, to saving bandwidth on links in communication networks.
In the operation of a communications network, the network, or a part of the network, may become congested with data. This may happen for any one or more of a number of reasons. For example, congestion may occur when there is an increase in the use of the network of in the data traffic in the network. Congestion may also result from changes in the topology of the network or from changes in the equipment or devices within the network. Congestion may cause lost or dropped data packets, delays in the data traffic moving through the network, or otherwise result in a significant Quality of Service (QoS) degradation.
A number of procedures and mechanisms may be used to prevent or to eliminate network congestion. For instance, object caching is a technique to save bandwidth on communication links in a network when similar content is transmitted multiple times on the link. In object catching, an intermediary network node caches content when it is first served; and for subsequent accesses to the same content, the content is served from the intermediary network node instead of the original content provider. Object caching reduces overall network load.
However, in many situations, a congested bottleneck link is somewhere in the middle of the network. While an object cache reduces overall network traffic, a congested link cannot solely rely on the presence of object caches as a solution to the congestion on the link since: (a) object caches may not be deployed or they may be out of service—an object cache out of service will impact a limited set of applications but a congested link will impact all applications; (b) there may be no object cache in the path for many users due to network technology; (c) object cache in general does not provide bandwidth savings when mirrors are used or when similar or same files are downloaded via different protocols.
In such a case, a set of synchronized caches at both ends of a congested link provides a better solution that is targeted specifically at the bottleneck link. Before content enters one end of the congested link, that content is passed through the cache at the end of the link to determine if the content (or part thereof) already exists in the cache. If so, then a short index is sent to the other end of the link instead of the matched content; and at the other end of the link, matched content is recovered from the synchronized cache at that end of the link. The synchronized caching is completely complementary and transparent to object caching.
The content is often transmitted in a compressed format. For example, a server may send compressed content to a browser that is capable of decompressing content (html:Content_Encoding: gzip is a standard and all standard browsers such as firefox, chrome, IE, support compressed content). Large software are often packaged and transmitted over computer networks in a compressed format (.cab, .zip). Many document formats such as PDF are compressed. After compression, even small difference in two content C1 and C2 will result in vastly different compressed content C1.zip and C2.zip as compressed output is largely random (if it was not, it could be further compressed). For example, for 2 Microsoft Word documents, while the difference between C1.doc and C2.doc is only 1 character (i.e., ˜0.6 KB bytes in binary form), the difference between the zipped version of the documents, C1.zip and C2.zip, is around 170 KB.
However, as a result of compressed content, caching techniques may be ineffective for about 15%-25% of the total network traffic. This fraction may increase in the future as storage moves to remote sites in a cloud computing environment and more compressed formats are used to save storage and bandwidth.