1. Field of the Invention
This invention generally relates to information retrieval, and more particularly, to an inference-driven multi-source semantic search.
2. Background Art
Information retrieval from a database of information is an increasingly challenging problem, particularly on the World Wide Web (WWW), as increased computing power and networking infrastructure allow the aggregation of large amounts of information and widespread access to that information. A goal of the information retrieval process is to allow the identification of materials of interest to users.
As the number of materials that users may search increases, identifying materials relevant to the search becomes increasingly important, but also increasingly difficult. Challenges posed by the information retrieval process include providing an intuitive, flexible user interface and completely and accurately identifying materials relevant to the user's needs within a reasonable amount of time. The information retrieval process comprehends two interrelated technical aspects, namely, information organization and access.
One fundamental search technique is the keyword-index search that revolves around an index of keywords from eligible target items. In this method, a user's inputted query is parsed into individual words (optionally being stripped of some inflected endings), whereupon the words are looked up in the index, which in turn, points to documents or items indexed by those words. Thus, the potentially intended search targets are retrieved. This sort of search service, in one form or another, is accessed countless times each day by many millions of computer and Internet users.
Two main problems of keyword searches are (1) missing relevant documents, and (2) retrieving irrelevant ones. Most keyword searches do plenty of both. In particular, with respect to the first problem, the primary limitation of keyword searches is that, when viewed semantically, keyword searches can skip about 80% of the eligible documents because, in many instances, at least 80% of the relevant information will be indexed in entirely different words than words entered in the original query. For simple searches with very popular words, and where relevant information is plentiful, this is not much of a problem. But for longer queries, and searches where the relevant phrasing is hard to predict, results can be disappointing.
Semantic searching is an improvement in keyword searching. Semantic search systems index and retrieve information based upon the ascertained meaning of information passages contained in a corpus of information. In the case of written language, words are analyzed in context, with understanding given to accepted meaning and grammar. This semantic analysis is performed by natural language understanding programs that create complex and often copious data structures that set forth the semantic relationships found in the analyzed data. At search time, natural language queries are translated into similar data structures. Relevant data is retrieved from the corpus of information by comparing the data structures generated for the query against the data structures generated for the information passages.
Current state-of-the-art information retrieval and question answering systems attempt to satisfy a user's information need by identifying the single source (e.g., document, passage, or phrase) that is most likely to contain relevant information. There are many information needs that cannot be satisfied by a single source. Rather, the information retrieval system must identify a number of relevant sources and further analyze or synthesize the information contained in those segments to satisfy the user's information need.