The present invention relates generally to managing data in computer systems, and will be specifically disclosed as a method and apparatus for organizing and using indexes.
The virtual explosion of technical advances in microelectronics, digital computers and software have changed the face of modern society. In fact, these technological advances have become so important and pervasive that this explosion is sometimes referred to as xe2x80x9cthe information revolution.xe2x80x9d Through telephone lines, cables, satellite communications and the like, information and resources are ever increasingly being accessed and shared.
The introduction and wide usage of computers and networks, including the Internet, has made information increasingly accessible. A vast array of informational resources are increasingly available, including textual information (e.g. books, articles, papers, letters, e-mail, etc.), graphical information (e.g. photographs, videos, drawings, images, etc.), audio information (e.g. voices, music, audio , etc.), interactive information (e.g. Internet web sites, hyper text markup language xe2x80x9cHTMLxe2x80x9d, Java, Active X, executable programs, etc.), and the like. Informational resources can include a single type of information or a combination of two or more types of information.
As the amount of information increases, management and retrieval of that information has become an increasingly important and complex problem. One preferred way to manage and retrieve information is through indexing. Indexing is the process of cataloging informational resources in an efficient and coherent manner so that it can be easily accessed. While indexes can be used for any kind of information, indexes are often used for textual informational resources. Text refers to typographic characters, both alphanumeric and specialty characters, such as the ASCII standard, and can also include semantic and formatting information, such as bold, underline, italics, colors, size, subscript or superscript, titles, headings, abstracts, and the like.
For a given informational resource, the ability to identify the resource and retrieve data is directly related to the amount and quality of information in the index. For example, a text index may contain only the titles of the textual informational resource, or it may contain only certain key terms. In many instances, the recommended solution is to provide indexing and searching on substantially every word in a collection of texts (e.g. a full text index). A full text index is essentially an inversion of the document or data (e.g. an inverted word list), and also may contain additional semantic information about the document from the format, context or from linguistics. While full text indexes can take a variety of forms and be created using many different techniques, U.S. Pat. Nos. 5,701,459 and 5,717,912 illustrate examples of creating and using full text indexes.
One challenge in indexing is how to merge indexes together with other indexes as information changes and is added. This is often encountered with informational resources which change rapidly. Updating the indexes very often creates many small indexes, resulting in high costs in merging the indexes together or in querying many indexes at once to get a result. Waiting longer before updating the index leaves the information out of date until the next indexing and merging interval. Querying a multitude of indexes to get an answer is very costly and begins to go back towards the slow scan searching of the original documents used before indexing began to be feasible. While the whole collection of indexes can be merged into a combined index, the cost of such a merging can be substantial. Moreover, such a merger might not be possible or the best thing to do if a collection of indexes is for one site, another collection for another site, and so on.
Accordingly, an object of the invention is to provide an improved method and apparatus for organizing and using indexes. Additional objectives, advantages and novel features of the invention will be set forth in the description that follows and, in part, will become apparent to those skilled in the art upon examining or practicing the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
One aspect of the present invention is a search decision table on a computer readable medium. The search decision table comprises a plurality of references to indexes, such as full text indexes, where each index corresponds to one or more informational resources. The search decision table has a plurality of references to ranges of text, wherein each range of text is bound by a lower text limit and an upper text limit. The references to the indexes and ranges of text are arranged in a matrix. Preferably, the references to ranges of text are sequentially arranged in alphanumeric order, and each reference to a range of text comprises the lower text limit. A plurality of cross-referencing data in the matrix correlate the references to the ranges of text and the references to the indexes, wherein each of cross-referencing data corresponds to a range of text and an index. Preferably, the matrix includes attribute data and/or index data for each range of text.
Another aspect of the present invention is a method of searching indexes on a computer system. A query is prepared comprising one or more text strings. A search decision table is accessed which cross-references ranges of text to a plurality of full text indexes. The ranges of text in the search decision table which correspond to each text string in the query are identified. The full text indexes are then determined from the search decision table, which correlate the identified ranges of text in accordance with any boolean qualifiers in the query, preferably by reading cross-referencing data corresponding to the identified ranges to text. The identified full text indexes are then searched in accordance with the query.
Still another aspect of the present invention is a computer system having a plurality of indexes, where each index corresponds to one or more informational resources. In one embodiment, indexes are organized in a hierarchial directory, such as a distributed directory. A search decision table has means for correlating the indexes with a searchable criteria, such as ranges of words, categories, phrases, and topics. Preferably, the search decision table is a matrix cross-referencing the searchable criteria to the indexes. The computer system also has a means for receiving a query and indexing the search decision table to determine indexes responsive to the query, and a means for searching the responsive indexes in accordance with the query.
Still other aspects of the present invention will become apparent to those skilled in the art from the following description of a preferred embodiment, which is by way of illustration, one of the best modes contemplated for carrying out the invention. As will be realized, the invention is capable of other different and obvious aspects, all without departing from the invention. Accordingly, the drawings and descriptions are illustrative in nature and not restrictive.