Users of the Internet and other network information retrieval systems sometimes suffer from excessive latency (or response time). That is, the time between making a request for information and the time that the requested information is finally delivered is often longer than desired. For a user of a Web browser, for example, such latency may require the user to wait for seconds, or even minutes, while the requested information (e.g., text, images, audio, or applications) is loaded. A typical Web page retrievable from the Internet contains several embedded images. Thus, the latency for presenting an entire page may be the sum of the latencies for numerous retrievals (e.g., the HTML file and several GIF or JPEG images).
The excessive latency problem is especially significant for slower network connections (e.g., dial-up via modem or wide-area wireless connections) and for heavily congested networks because congestion tends to increase transmission time and therefore reduce effective bandwidth.
There are several prior approaches to achieve latency reduction on the Internet and other networked systems. In one approach, data objects are cached at an ultimate client or at an intermediate (proxy) cache for subsequent re-use. Caching is not useful, however, if a requested data object is not already present in the cache, which happens more often than not.
In another approach, compression algorithms are used to compress the representation of data objects before transmission. Such compressed data objects are then decompressed without loss upon reception. In general, most image and audio formats on the Internet are already compressed.
In yet another approach, distillation algorithms are used to remove certain contents from requested data objects, so as to make the requested data objects transmissible in fewer bytes (e.g., converting a 400×400 pixel image to a 200×200 pixel image). Unlike compression, however, distillation is irreversible and may degrade the resulting data object or render it useless.
In yet another approach, prediction, by various means, is performed to automatically request in advance data objects that may be requested in the future. Unless the prediction algorithm is perfect, however, this approach is liable to result in false pre-fetches and hence wasted bandwidth, which can itself increase latency.
In another approach (a so-called delta encoding mechanism), when a modified data object is very similar to a previous instance already held in a cache, the sender can transmit the differences between the two instances, rather than the entire new instance. This approach saves significant amounts of transmission time in some cases, but not in the majority of cases.
Frequently, data objects on the Internet or other networked systems appear in multiple exact copies, but with different names. For example, the same Compaq logo image might appear under different URLs at different servers. Because the URL for each such data object is different, traditional caches do not recognize that a request for a data object at one URL is satisfiable by the same data object at another URL.
Thus, it is desirable to provide a system and method for reducing retrieval latency that overcome the problems associated with prior approaches.