U.S. patent application Ser. No. 09/512,963, titled “CONSTRUCTION, MANIPULATION, AND COMPARISON OF A MULTI-DIMENSIONAL SEMANTIC SPACE,” filed Feb. 25, 2000, describes a method and apparatus for mapping terms in a document into a topological vector space. Determining what documents are about requires interpreting terms in the document through their context. For example, whether a document that includes the word “hero” refers to sandwiches or to a person of exceptional courage or strength is determined by context. Although taking a term in the abstract will generally not give the reader much information about the content of a document, taking several important terms will usually be helpful in determining content.
The content of documents is commonly characterized by an abstract that provides a high-level description of the contents of the document and provides the reader with some expectation of what may be found within the contents of the document. (In fact, a single document can be summarized by multiple different abstracts, depending on the context in which the document is read.) Patents are a good example of this commonly used mechanism. Each patent is accompanied by an abstract that provides the reader with a description of what is contained within the patent document. However, each abstract must be read and compared by a cognitive process (usually a person) to determine if various abstracts might be describing content that is semantically close to the research intended by the one searching the abstracts.
Accordingly, a need remains for a way to associate semantic meaning to documents using dictionaries and bases, and for a way to search for documents with content similar to a given document, both generally without requiring user involvement.