1. Field of the Invention
The present invention relates to the field of data compression and more specifically to schemes for compression and decompression of archival mail files.
2. Art Background
Archival emails for a single user that are not frequently accessed (say, emails more than a year old) are often stored in a reverse chronological order in a single large file, where each email message has a message header and a message body. This type of archival file also typically has a table of contents in the beginning of the file for direct access to individual mail messages.
Standard archival techniques include compress such large mail files using standard compression algorithms such as bzip or lzma. Compression saves storage and does not adversely impact end user experience if the emails are rarely accessed. However, in case the user ever wants to access even a single email from this file, the entire file has to be decompressed in memory. This is computationally expensive and could result in high latency for the end user.
Furthermore, standard compression techniques in isolation tend not to take advantage of both long-range and short-range similarities in the data being compressed. In email archives, and some other types of structured data, we expect to see many long-range similarities; however, standard short-range compression techniques are still effective. Thus a compression scheme that employs both short- and long-range similarities is desirable.
Bentley and Mellroy. (Bentley, J. L., and Mellroy M. D. Data compression using long common strings. In Data Compression Conference (1999), pp. 287-295.) proposed one widely adopted method that effectively takes advantage of long-range similarities. However, the method of Bentley and Mellroy does not permit selective decompression, nor is it adapted for structured archival email files. The Karp-Rabin fingerprinting method discussed within Bentley and Mellroy and below can be found in standard texts such as Algorithms by Cormen, Leiserson, Rivest, and Stein.