1. Field of the Invention
The invention relates to storage and retrieval of large amounts of textual information including the application of textual queries against a compressed textual database.
2. Description of the Prior Art
Large textual databases are known that store vast amounts of information. For example, databases exist that store the full text of U.S. patents that have issued over the past forty to fifty years. Queries, such as searches, can be made against such databases to retrieve particular information such as all of the patents that contain a user supplied search key. Such a search key can be an English language phrase or a sequence of English words or phrases separated by relational operators such as AND, OR, NOT, WITHIN (N) and the like. As another example, the telephone companies maintain large databases of customer information utilized, for example, in call validation and billing procedures.
Dictionary based data compression/decompression systems are known that compress input text into compressed code and recover the input text by decompressing the compressed code. Such systems are character oriented, in that strings of input characters are absorbed into the compressor and translated into corresponding compressed code symbols. For example, the compression might be performed over an alphabet comprising the 256 ASCII characters. Such data compression/decompression techniques are exemplified by the well-known LZW procedure of U.S. Pat. No. 4,558,302 by Welch, issued Dec. 10, 1985. Another character oriented data compression/decompression algorithm, known as LZ2, is described in a paper entitled "Compression Of Individual Sequences Via Variable-Rate Coding" by J. Ziv and A. Lempel, published in the IEEE Transactions On Information Theory, Vol. IT-24, No. 5, September 1978, pages 530-536. Further character oriented compression and decompression techniques are described in U.S. Pat. No. 4,876,541 by Storer, issued Oct. 24, 1989; U.S. Pat. No. 4,465,650 by Eastman et al., issued Aug. 7, 1984; U.S. Pat. No. 4,814,746 by Miller et al., issued Mar. 21, 1989; U.S. Pat. No. 5,087,913 by Eastman, issued Feb. 11, 1992; U.S. Pat. No. 5,153,591 by Clark, issued Oct. 6, 1992; and U.S. Pat. No. 5,373,290 by Lempel et al., issued Dec. 13, 1994.
The above-described character oriented data compression/decompression procedures may be applied to a large textual database so as to generate a compressed version of the database for more efficient storage or transmission. Heretofore, in order to apply a textual query against a database, the database must be in uncompressed form. Thus, a mass database would be compressed for archival or transmission purposes, but the entire database would be decompressed or exist in original uncompressed form for active query usage. Additionally, the compression/decompression dictionaries constructed in the compression/decompression procedures are only used in the archival or transmission compression/decompression activities and are not used, on line, in active database usage.
It is appreciated that the efficient storage and retrieval of large amounts of textual information is a problem encountered by on-line services, CDROM-based information products and document delivery systems. The problem is encountered with respect to storage and retrieval of textual information based upon a user request or query.