The field of the invention relates to document retrieval and more particularly to search engines operating within the context of a database.
Automated methods of searching databases are generally known. For example, P. G. Ossorio developed a technique for automatically measuring the subject matter relevance of documents (Ossorio, 1964, 1966, 1968, 1969). The Ossorio technique produced a quantitative measure of the relevance of the text with regard to each of a set of distinct subject matter fields. These numbers provided by the quantitative measure are the profile or information spectrum of the text. H. J. Jeffrey produced a working automatic document retrieval system using Ossorio""s technique (Jeffrey, 1975, 1991). The work by Ossorio and Jeffrey showed that the technique can be used to calculate the information spectra of documents, and of requests for information, and that the spectra can be effective in retrieving documents.
However, Ossorio""s technique was designed to solve a particular kind of document retrieval problem (i.e., fully automatic retrieval with complete cross-indexing). As a result, the technique has certain characteristics that make it unusable for information retrieval in cases in which there is a very wide range of subject matter fields, such as the Internet.
In general, in one aspect, the invention features a method for processing information. The method includes receiving a segmented judgment matrix and using the segmented judgment matrix to calculate an information spectrum. The segmented judgment matrix is a numerical matrix pairing each of a set of terms to each of a set of classifications where each term is a word or phrase. The segmented judgment matrix includes information submatrices with each element of each information submatrix representing a rating of a relevance of the term of the element to the classification of the element. Each information submatrix is a numerical matrix representing the relevance of each of a subset of the set of terms to each of a subset of the set classifications.
In some implementations, at least some of the elements of the information submatrices represent ratings of relevance made by a human being. The segmented judgment matrix may include rows and columns, with each column of the segmented judgment matrix representing a classification and each row of the segmented judgment matrix representing a term.
The method for processing information may further include receiving a search request, using the segmented judgment matrix to calculate an information spectrum of the search request, and using the segmented judgment matrix to calculate an information spectrum for each of a plurality of documents. The calculated information spectrums then may be compared to identify at least some documents of the plurality of documents as relevant to the search request. In some implementations, each information submatrix includes a plurality of classifications and a plurality of terms relevant to each classification. In such implementations, the information spectrums are calculated based upon at least some of the plurality of terms. The plurality of terms may be selected based upon a relevance of each term of the plurality of terms to at least some of the classifications of the information submatrices.
The step of calculating an information spectrum for each document and for the search request may include determining a log average among the ratings of relevance of the terms for each classification. The information spectrums for each document may be compared by determining a distance between the information spectrum of the at least some documents and the information spectrum of the search request.
In some implementations, the method for processing information further includes selecting a document of the identified documents as definitely relevant to the search request. The method for processing information may use the calculated information spectrum for the selected document to form a new search request. Some implementations also may allow zooming in on a portion of a document information spectrum. The method may determine that a document and request have a wide spectrum with significant content in a field F of a term and measuring the request and document using a subengine for field F.
In another general aspect, a computer program product includes instructions operable to cause data processing apparatus to receive a segmented judgment matrix and use the segmented judgment matrix to calculate an information spectrum.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features will be apparent from the description and drawings, and from the claims.