The present invention relates generally to the transfer of information. between an electronic mail server and an associated client application, and more particularly, to a technique that increases the speed at which information may be transferred using cache-based compaction.
Despite the phenomenal growth of the Internet, advances in the means and speed of access to the Internet have not kept pace with demand. This is particularly true at the xe2x80x9clast hopxe2x80x9d between (a) the Internet site to which the user is connected, such as the user""s Internet Service Provider (ISP) and (b) the computer or workstation on which the user is running an Internet application. Today, the Internet site/user last hop is still mainly implemented as a connection over a telephone line using a traditional modem, with communication speeds up to only 56 kilobits per second (kbps).
Separately, the use of wireless communications links in the last hop is gaining popularity. Its growth is fueled by the confluence of 3 factors: (1) the development of digital air interface protocols that support data (e.g., CDPD, IS-95, IS-136, GSM/GPRS); (2) the availability of new classes of portable Internet-capable end devices (e.g., Palm Pilot, Handheld PC, Nokia 9000) and wireless modems (e.g., Novatel Wireless); and (3) the falling usage cost for wireless communications. Again, the raw bandwidth available on most wireless channels is low (e.g., 19.2 kbps for CDPD), which can be further impaired by their multiple-access contention nature and protocol overhead. For example, the effective application layer throughput of CDPD is about 8 kbps without contention.
In a nutshell, Internet traffic behind slow wireline access links will likely persist for years to come. Wireless Internet access, which is emerging only now, will present an even more severe problem.
A number of previous approaches have been suggested in order to reduce the delay incurred in the last hop. Most of these approaches involve increased usage of storage or computation to make up for limitations in the communication medium. This typically amounts to a trade-off, since storage and computational complexity each add overhead and cost to the system. The key to the success of any of these processing techniques is that the increase in processing delay should be more than offset by the decrease in transport delay, thus resulting in a decrease in the overall latency.
One technique, known as xe2x80x9ccachingxe2x80x9d, stores earlier responses, and reuses them to satisfy a repeated request. For example, an electronic mail server might cache both the set of received electronic mail messages and a list of mail destinations repeatedly used by a user for sending mail. Another technique, known as xe2x80x9cprefetchingxe2x80x9d, tries to predict, fetch and store information before it is needed. For example, an electronic mail client might selectively prefetch information describing newly received electronic mail messages (such as the sender""s name, date, message length and subject).
Compression can be achieved by the use of differential transfer to transmit only changes between current and past information. Some of the differencing algorithms used are UNIX diff and vdelta, as described by J. Hunt, K. P. Vo, and W. Tichy, xe2x80x9cAn Empirical Study of Delta Algorithmxe2x80x9d, IEEE Software Config. and Maint. Workshop, March 1996; and J. Hunt, K. P. Vo, and W. Tichy, xe2x80x9cAn Empirical Study of Delta Algorithmsxe2x80x9d, IEEE Software Config. and Maint. Workshop, March 1996. The benefits of delta coding were also studied by Jeffery C. Mogul, Fred Douglis, Anja Feldmann, and Balachander Krishnamurthy, xe2x80x9cPotential Benefits of Delta Encoding and Data Compression for Httpxe2x80x9d, Proceedings of the ACM SIGCOMM, pages 181-194,1997.
The amount of latency reduction obtainable from caching and prefetching components of electronic mail messages is limited. Accordingly, there is a significant need for an improved latency reduction technique.
In accordance with the present invention, the amount of information that must be transmitted, for example, between an electronic mail server and a client is advantageously reduced using a cache-based compaction technique in which the requested object is encoded in the server using information relating to similar objects that were previously supplied to the client by the mail server.
More specifically, when the client requests an object, and that object is not already in the client""s local cache, similar objects in the local cache are identified, and a request is sent to the server to retrieve the desired object using a set of stored xe2x80x9creferencexe2x80x9d objects that are identified in the request and are similar to the requested object. Instead of sending the actual object from the server to the client, the object is encoded using some or all of reference objects that are available in both the server and the client cache. The more similar the reference objects are to the requested object, and the more such similar reference objects are available to the server and the client, the smaller is the resulting transfer. The encoded information received by the client is then decoded in conjunction with the set of reference objects previously identified in the local cache, and the decoded information is provided to a user application in the client.
In accordance with one aspect of the present invention, the selection of reference objects is based upon the notion that objects that are similar in content tend to have similar descriptors (for example, electronic mail descriptors such as sender name, date, message length and subject). Less computational effort is required to determine similarity among descriptors than to determine similarity among complete objects.
Descriptors may be chosen according to object type (for example, electronic mail message header or body). By selecting one or more reference objects with similar descriptors for encoding the requested object, the probability of finding similar content in the set of reference objects increases, leading to a smaller encoded reference object.
In accordance with another aspect of the present invention, the encoding process uses a universal data compression algorithm that isolates data strings in each reference object that are held in common with the requested object. Common strings are encoded in a very concise way by encoding only the locations and sizes of the common strings in the reference objects. As a result, the encoded object is significantly smaller than the original object. After the encoded object is transmitted, decoding of the encoded object is enabled by reference to the reference objects.
In accordance with yet another aspect of the present invention, the cache-based compaction technique is applied to the transmission of an electronic mail message from the client to the mail server.