As computers have become more and more ubiquitous, the amount of computer readable data has dramatically increased. For instance, businesses and individuals create countless word processing documents (e.g., documents created using Microsoft's Word™), spreadsheets (e.g., documents created using Microsoft's Excel™), slide presentations (e.g., documents created using Microsoft's Power Point™) and other files on a daily basis. The volume of computer accessible files has further increased as a result of the proliferation of electronic mail as a communication vehicle of choice in both the business and personal contexts.
Individuals and businesses frequently want to store the data and files they create. Such data and files are often stored so that they can be re-used or re-purposed, or to create a historical record of activities and communications. Given the prolific creation of electronic data and the desire to retain such data for possible future use, the demand for electronic storage space has steadily increased. Various types of storage mediums from floppy and hard disk drives, to flash memory devices, to optical storage devices such as DVD (digital versatile disk) devices and compact disc devices have been developed to meet this demand.
At the same time that the types and volumes of storage mediums have multiplied, file compression solutions seeking to store datasets more efficiently have been developed. File compression techniques have the same basic goal, namely, reducing the size of a dataset to reduce storage space or transmission time. Compression techniques may achieve these goals by, for example, replacing a series of repeating characters in a dataset to be compressed with a shorter code representing the same, using codes to represent frequently recurring objects or strings in the dataset, and/or removing unnecessary text such as extra spaces from the dataset. Compression techniques can be lossy or lossless. As the name suggests, lossy compression techniques lose some of the original data from the compressed dataset such that the dataset reconstructed from a compressed dataset is not exactly the same as the original dataset before compression. Similarly, lossless compression techniques are able to restore all of the data originally present in a dataset after the dataset has been compressed and reconstructed.
Many applications are provided with functionality for compressing datasets. For example, Microsoft's Outlook™ product is provided with an archiving tool to compress email messages into a compressed archive file. Other known data compressors include RAR and WinZip™.
The LZW compression technique is one well known algorithm for compressing a file. The LZW technique, (which is named for its developers Lempel, Ziv and Welch), breaks a dataset to be compressed into non-overlapping sequential blocks of bits. These blocks of bits are used to sequentially populate a table wherein a unique code is assigned to each block of bits entered in the table. The codes are shorter than the length of the block of bits to which they are assigned. When the LZW algorithm reaches a block of bits in the file, it compares the block of bits to the blocks of bits already written in the table and, thus already appearing in the compressed dataset being created. If the block does not already exist in the table, it is added to the table and assigned a unique code. It is also written into the compressed dataset. If, on the other hand, the block already exists in the table, the duplicate block of bits is not written to the compressed dataset. Instead, the corresponding code from the table is written to the compressed dataset in place of the duplicate block of bits, thereby shortening the dataset. After the entire dataset has been processed in this fashion, the table is appended to the compressed dataset.