The field of the invention relates to document retrieval and more particularly to search engines operating within the context of a database.
Automated methods of searching databases are generally known. For example P.G. Ossorio developed a technique for automatically measuring the subject matter relevance of documents (Ossorio, 1964, 1966, 1968, 1969). The Ossorio technique produced a quantitative measure of the relevance of the text with regard to each of a set of distinct subject matter fields. These numbers provided by the quantitative measure are the profile or information spectrum of the text. H. J. Jeffrey produced a working automatic document retrieval system using Ossorio""s technique (Jeffrey, 1975, 1991). The work by Ossorio and Jeffrey showed that the technique can be used to calculate the information spectra of documents, and of requests for information, and that the spectra can be effective in retrieving documents.
However, Ossorios technique was designed to solve a particular kind of document retrieval problem (i.e., fully automatic retrieval with complete cross-indexing). As a result the technique has certain characteristics that make it unusable for information retrieval in cases in which there is a very wide range of subject matter fields, such as the Internet.
A method and apparatus are provided for searching for information. The method includes the step of segmenting a judgement matrix into a plurality of information sub-matrices where each submatrix has a plurality of classifications and a plurality of terms relevant to each classification. The method further includes the steps of evaluating a relevance of each term of the plurality of terms with respect to each classification of each information sub-matrix of the information sub-matrices and calculating an information spectrum for each of a plurality of documents based upon at least some of the plurality of terms. The method further includes the steps of receiving a search request, calculating an information spectrum of the search request based upon at least some of the plurality of terms and identifying at least some documents of the plurality of documents as relevant to the request based upon a comparison of the calculated information spectrums.