1. Field of the Invention
This invention relates to the field of data processing and, more specifically, to the indexing of data.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever. Sun, Sun Microsystems, the Sun logo, Solaris, xe2x80x9cWrite Once, Run Anywherexe2x80x9d, Java, JavaOS, JavaStation, HotJava Views and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.
2. Background Art
Computers have enabled the storage and transmission of large amounts of information. Large numbers of documents, images, multimedia works, and other information are available in an electronically-readable form. However, often only a particular piece of information is needed at a particular time. Finding the desired information among all available information can be a very difficult task.
Fortunately, computers can be programmed to search for the desired information. Computer software designed for searching, also known as xe2x80x9csearch engines,xe2x80x9d can help locate the desired information. Because of the large amount of information available, search engines generally do not search through all available information in response to each search query. Rather, search engines generally utilize a database, referred to as an index, that summarizes the content and location of the a large amount of information. An index is a compilation of specific pieces of information, for example words or phrases, each with references to locations within the document or documents that are within the scope of the index at which the specific pieces of information occur.
Nonetheless, often such search engines and indexes require large amounts of memory to store their data and program code. Such large memory requirements prevent the search engines and indexes from being used on smaller computing platforms, for example, handheld computers or appliances enhanced with data processing capability, or from being transmitted easily between computing environments, for example, over an internet connection.
The invention provides a technique for indexing data. The invention provides for the compressing an index to obtain a compressed index that is easily stored and transmitted. The invention also provides for the decompression of such a compressed index. The invention further provides for the maintenance and use of a plurality of files that contain indexing information.
The invention avoids the disadvantages of previous indexing schemes. *The invention does not require large amounts of memory to store or transmit an index. The invention may be practiced using a platform-independent programming language, for example, the Java(trademark) programming language, to allow compatibility with almost any computing platforms, including very small or limited computing platforms. Moreover, the invention also allows very fast searching capability.
The invention provides high modularity of indexing information, thereby allowing easy distribution of indexing information, as well as easy incremental indexing and updating of indexing information. Also, the invention simplifies proximity-based searching techniques. Further, the invention provides fast searching capability by not requiring that all indexing information be decompressed to perform a search. Additionally, the invention supports multiple simultaneous queries by allowing decompression of multiple indexes simultaneously or in one procedure.