U.S. Pat. No. 5,794,050 to Dahlgren et al. provides for a Natural Language Understanding System. A naive semantic system that incorporates modules for text processing based upon parsing, formal semantics and discourse coherence, as well as relying on a naive semantic lexicon that stores word meanings in terms of a hierarchical semantic network is disclosed. Naive semantics is used to reduce the decision spaces of the other components of the natural language understanding system of. According to Dahlgren, naive semantics is used at every structure building step to avoid combinatorial explosion.
For example, the sentence “face places with arms down” has many available syntactic parses. The word “face” could be either a noun or a verb, as could the word places”. However, by determining that “with arms down” is statistically most likely to be a prepositional phrase that attaches to a verb, the possibility that both words are nouns can be eliminated. Furthermore, the noun sense of “face” is eliminated by the fact that “with arms down” includes the concepts of position and body, and one sense of the verb “face” matches that conception. In addition to the naive semantic lexicon, a formal semantics module is incorporated, which permits sentences to be evaluated for truth conditions with respect to a model built by the coherence module. Coherence permits the resolution of causality, exemplification, goal, and enablement relationships. This is similar to the normal functionality of knowledge bases.
Natural language retrieval is performed by Dahlgren's system using a two-stage process referred to as digestion and search. In the digestion process, textual information is input into the natural language understanding module, and the NLU module generates a cognitive model of the input text. In other words, a query in natural language is parsed into the representation format of first-order logic and the previously described native semantics. The cognitive model is then passed to a search engine, that uses two passes: a high recall statistical retrieval module using unspecified statistical techniques to produce a long list of candidate documents; and a relevance reasoning module which uses first-order theorem proving, and human-like reasoning to determine which documents should be presented to the user. Generally, Dahlgren analyzes text based on sentence structure. The sentence is analyzed using a word-by-word analysis and a whole sentence analysis. Disclosed is a method for interpreting natural language input, wherein parsing and a naive semantic lexicon are utilized in conjunction to determine the plausibility of interpretative decisions, and wherein at least one entry identifying at least one sense of a word may be related to an ontological classification network, syntactic information, and a plurality of semantic properties.
Dahlgren system uses a semantic network similar to the ontologies employed in the system of present invention. However, it relies on a complicated grammatical system for the generation of formal structures, where complicated grammatical information is needed to eliminate possible choices in the parser. The concept based search engine system of the present invention provides an advantage in that it uses a simple grammatical system in which rule probabilities and conflicting ontological descriptions are used to resolve the possible syntactic parses of sentences. This greatly reduces the processing power required to index documents.
U.S. Pat. No. 6,675,159 to Lin et al. provides for a Concept-Based Search and Retrieval System. Disclosed is a concept-based method for searching text documents, wherein the method provides transforming a natural language query into predicate structures representing logical relationships between words in the natural language query; an ontology containing lexical semantic information about words; and means for ranking a set of matching natural language query predicate structures and equivalent text document predicate structures.
Lin's system imposes a logical structure on text, and a semantic representation is the form used for storage. The system provides logical representations for all of the content in a document and a semantic representation of comparable utility with significantly reduced processing requirements, and no need to train the system to produce semantic representations of text content. While training is needed to enable document categorization in the system, generation of the semantic representation is independent of the categorization algorithm.
U.S. Pat. No. 6,766,316 to Caudill et al. assigned to Science Application International Corporation, provides for a Method and System of Ranking and Clustering for Document Indexing and Retrieval. Disclosed is a relevancy ranking/clustering method and system for an information retrieval system which ranks documents based on relevance to a query and in accordance with user feedback. Additionally, a question and answering system further provides an answer formulation unit providing a natural language response to the input query.
U.S. Pat. No. 6,910,003 to Arnold, assigned to Discern Communications, Inc., discloses system and method for searching. Raw text is retrieved or input into the system. The raw text is parsed into components such as date, place, location, actors, and the like. The raw text is stored in topic specific information caches based on the individual components. In operation, a user enters a query. The system parses the user query and compares the parsed query to the topic specific information caches. Matching topic specific information caches are displayed to the user.
U.S. Patent Publication No. 2002/0059289 to Wenegrat et al. provides for Methods and Systems for Generating and Searching a Cross-Linked Keyphrase Ontology Database. Disclosed is a method of generating a cross-linked key-phrase ontology database, wherein the cross-linked key-phrase ontology database may be searched by parsing a natural language statement into a structured representation. The methods and systems of the invention involve the generation and use of a cross-linked keyphrase ontology database. A cross-linked keyphrase ontology database is created by: (a) defining at least one keyphrase; (b) representing the keyphrase by a keyphrase node in an ontology; (c) cross-linking the keyphrase node to at least one second keyphrase node, where the second keyphrase node represents a second keyphrase in a second ontology; and (d) repeating steps (b)-(c) for each keyphrase defined in step (a). The keyphrase in step (a) may be generated by parsing a text and can be selected from a group consisting of nouns, adjectives, verbs and adverbs. In one embodiment, the keyphrase in step (a) and the second keyphrase have at least one word in common. The text parsed may be in English or in any other written or spoken language
U.S. Patent Publication No. 2004/0103090 to Dogl et al. provides for a Document Search and Analyzing Method and Apparatus. Disclosed is a document search system having an ontology indexing function (ontology indexer 113), wherein search engine sub-system 125, in conjunction with indexer 113 and concept search engine 126, provides means for processing/parsing search queries to create a new entry for each word in a word index, and then associates it with a unique word ID, thereby allowing result builder 222 to create a two dimensional or three-dimensional graphical representation of the query data set or ontology (visualization model).
U.S. Patent Publication No. 2005/0125400 to Mori et al. provides for an Information Search System, Information Search Supporting System, and Method and Program for Information Search. Disclosed is an information search system having conversion means for decomposing a natural language sentence according to a dependence relationship between single words of the natural language and a corresponding ontology as means for generating an inquiry sentence.
Foreign patent WO/0235376 to Busch et al. provides for an Ontology-Based Parser for Natural Language Processing. Disclosed is a system and method for converting natural-language text into predicate-argument format by utilizing an ontology-based parser, wherein a sentence lexer may be utilized to convert a sentence into ontological entities tagged with part-of speech information.
U.S. Patent Publication No. 2002/0147578 to O'Neil et al., assigned to LingoMotors, Inc., provides for a Method and System for Query Reformulation for Searching of Information. Disclosed is a method for searching information using a reformulated query expression where a user inputs a query. The query is generally in a natural language form. The query is indicated as an input query. The input query is provided into an engine 103 to convert the natural language form into a logical form. The logical form is preferably one that has semantic information provided into the logical form. The logical form also has key terms of the query, among other information and is used for the query.