1. Field of the Invention
The present invention relates to a system whereby a direct answer may be given to a specific natural language query through a convenient search of structured and unstructured data sources.
2. Discussion of the Related Art
There are two types of digital data gathering commonly in use. One, information retrieval, is concerned with the retrieval of information from unstructured data sources, such as text documents, where each element of the data is not individually defined. The user will enter “search terms” as a data query and the unstructured data will be searched for occurrence of these terms. Results of such a search may return the text, i.e., the data, or may, e.g., in a World Wide Web search, only return the location, or site, of the data. The user would then need to read the text or go to each site and locate the occurrence of the search term, which may, or may not, be relevant to an actual question which the user wants answered. This time consuming practice is commonly known as “surfing”.
Information retrieval is thus not geared to efficiently provide a specific answer to a specific question. Attempts to alleviate this problem were the subject of U.S. Pat. No. 6,167,370 to Tsourikov et al., which suggests giving a summary of text findings as a response to a user query. But, for example, when a user wants to know “What are the three best Sushi restaurants in Chicago?” the user does not necessarily care to browse through text summaries, or restaurant guide web sites, which are the likely search results of a known information retrieval search. The surfing in this context may be particularly tedious if the query is submitted to the data sources as an equally weighted string of tokens. For example, where “Chicago” is equally weighted with “Sushi” when figured into the search results, a user may wade through scores of restaurant web sites having nothing to do with Sushi eateries. Avoidance of this problem may require the user to know Boolean logic or other specific search strategy formats, and individually structure each search. The user would most often prefer just a list of three Sushi restaurants in Chicago in response to this natural language question.
The second type of digital data gathering commonly in use is the structured data source search, where highly structured data within one specific data source, usually privately owned and accessed, are searched to return a specific answer. In the past, the data sources were required to be searched one data source at a time. Integration of their individual data sources is generally performed by private business to enable answers to queries whose answers require more than one factual component. This integration is expensive and can remain underutilized for reasons such as an arcane nature of query formulation or because extensive data source knowledge may be required of the user to make a rational search selection. That is, the user may need to know where to look and how to look to expect a relevant answer. Concurrent searching of unintegrated structured data sources, and merging of their results, to solve some of these problems, was the subject of U.S. Pat. No. 5,995,961 to Levy, et al.
Further, additional information, beyond the specific factual components of a query, cannot be provided from the results of a data source search. For example, assume that the data source user, or searcher, wishes to know the building on the I.I.T. campus with the largest number of rooms. The user cannot expect a picture of the building, or a link to a picture of the building, returned with the search results, even though the user might wish to see such a picture.
U.S. Pat. No. 6,078,924 to Ainsbury et al. illustrates a technique of digital data gathering. According to this patent, the user is allowed to aggregate data found in the user's previous searches on a specific topic into a central file. This central file can then be controlled from a commercial desktop computer application to facilitate searching of the data.
What is needed in the art is a system whereby the user can take advantage of both information retrieval and structured data types of digital data gathering concurrently to provide a direct answer to a specific question, and preferably provide further context for that answer. It is also desirable that the query be accepted in a natural language format whereby the user needs no special skills in query formulation. It is further desirable that the query be intelligently parsed so as to weight the relevant parts of the query and that synonyms of the natural language query be provided to give a more thorough search and accurate answer. It is further desirable that the answers, and any related information, be limited in number to only that required or most relevant to the query.