The present invention relates to computer systems and more particularly to a method and system for data compression which provides the ability to use compressed data without completely decompressing the file.
Data compression is useful in a wide variety of computing applications and computer systems. In general, data compression is useful any time more information is desired to be stored in a given amount of memory. For example, many applications are run on systems having a reduced amount of memory. Personal digital assistants (PDAs) and processors in automobiles are systems having limited memory. It is also desirable to provide more functionality on systems having limited memory. In some cases, the amount of memory in such systems could limit the systems"" utility. In order to store more information on a given amount of memory, the data is compressed. Thus, data compression allows additional data to be stored on systems having limited memory.
Other systems or applications may store a large amount of data. For example, a large corporation may store data relating to customers and vendors. The computer system may be used for a variety of other applications. Thus, there may be a large amount of important information residing on the computer system. Simulations resulting in a large amount of stored data may also be run on a computer system. Storage of large data files or a large number of data files reduces the amount of space available for other information. Data compression allows for more information important to the users of the system to be stored. Similarly, individual users may wish to retain more information on the storage of their home computer. Data compression allows the user to store more information on a storage media. Furthermore, many individuals today access files via private networks or public networks, such as the Internet. Large files take a longer time to download. If data in the file is compressed, the time to download the file is reduced. Consequently, data compression is useful in applications that are as pervasive as they are varied.
Conventional data compression utilizes a dictionary. As a source data file is read, unique patterns of bits are searched for and a dictionary generated. The dictionary associates a unique pattern of bits with a code word. If the current bits being read do not match a pattern, then the bits are resaved in the order of the source data file. However, if a pattern recurs, then a code word replaces the pattern. Typically, the code word may point to a previous occurrence of the pattern. However, the code word could also point to the entry in the dictionary corresponding to the code word. Thus, a conventional compressed file is generated. Where there are recurring patterns of bits in the source data file, the conventional compressed file and the dictionary occupy less space than the source data file.
Although conventional data compression allows the conventional compressed file to occupy less memory than the source data file, this is only true while the conventional compressed file is stored. Many applications randomly access data being used. Thus, if the conventional compressed file is to be used, the conventional compressed file must be completely uncompressed. The uncompressed file occupies the same amount of space as the source data file. If the entire conventional compressed file is uncompressed and stored in memory, then memory is consumed. The conventional compressed file may be uncompressed on the fly. However, the uncompressed data can only be accessed sequentially. Thus, the application using the uncompressed data file must use data sequentially or uncompress the conventional compressed file multiple times. Thus, the applications which can be used are limited or processor resources are consumed.
Accordingly, what is needed is a system and method for providing data compression which allows the data to be utilized without full uncompression of the entire compressed file. The present invention addresses such a need.
The present invention provides a method and system for compressing data on a computer system. The method and system comprise separating the data into a plurality of segments. The plurality of segments include a plurality of unique segments. The method and system also comprise providing a plurality of code words. Each of the plurality of code words corresponds to a unique segment of the plurality of unique segments. The method and system also comprise providing a representation of the data. The representation includes the plurality of code words. The plurality of code words in the representation replaces the plurality of segments.
According to the system and method disclosed herein, the present invention allows compressed data to be used without full uncompression, thereby increasing the ability of the system to use the data with a given amount of memory.