1. Technical Field:
An “Iterative Query Reformulator”, as described herein, uses a computational engine to process and reformulate an initial query through one or more iterations so that results returned from a search engine or recommendation system using a final reformulated query have improved relevance relative to results that would have been returned using only the initial query.
2. Related Art:
Typical search engines rely on linguistic matches to find documents that are relevant to a user's query. For example, if a user enters the simple search query {Barak Obama}, the search engine will generally return a group of sorted links or responses that are the most popular for that query. Further, given the specificity and simplicity of that initial query, most of the links returned are likely to be highly relevant to that query. However, if the user enters a slightly more complex query, such as, for example, {wife of Barak Obama}, typical search engines will generally return a number of links or responses that might not be relevant. In particular, many of the links or responses returned by the search engine from the second query will generally still refer to or include information relating to Barak Obama, but may not be the most relevant links for Michelle Obama, who is the intended target of the second query.
More specifically, typical search services and question-answering systems generally depend on techniques for analyzing free-text queries or questions, and also depend on techniques for composing or identifying relevant information or explicit answers from one or more data sets or databases of information. Providing relevant information or explicit answers to freely worded queries or questions is generally a challenging problem because a structured or unstructured dataset being searched may not contain explicit matching information or answers. In addition a data set may contain multiple variants of relevant answers or answer components.
Various approaches to information retrieval and question answering have relied on the application of several key concepts from information retrieval, information extraction, machine learning, and natural language processing (NLP). Automatic question answering from a single, constrained information source is extremely challenging. Consider the difficulty of gleaning an answer to the question “Who killed Abraham Lincoln?” from a source which contains only the text “John Wilkes Booth altered history with a bullet. He will forever be known as the man who ended Abraham Lincoln's life.” However, answering a question is easier when the vast resources of the internet are used, since hundreds of web pages contain the literal string “killed Abraham Lincoln,” providing multiple opportunities for matching and composition.
Many efforts in question answering have focused on fact-based, short-answer questions such as “Who killed Abraham Lincoln?”, “What was the length of the Wright brothers first flight?”, “When did CNN begin broadcasting” or “What two US biochemists won the Nobel Prize in medicine in 1992?” Some question-answering systems have used NLP analyses to augment standard information retrieval techniques. These systems may identify candidate passages using information retrieval (IR) techniques, and then perform more detailed linguistic analyses of both the question and matching passages to find specific answers. A variety of linguistic resources (part-of-speech tagging, parsing, named entity extraction, semantic relations, dictionaries, etc.) may be used to support question answering. Other approaches may use general information retrieval techniques that employ methods for rewriting questions or reformulating queries to match the format of answers and then combine multiple results to generate answers.
Other techniques, such as, for example, the well-known “Wolfram|Alpha” search platform, provide a computational engine for search. In general, such techniques begin by performing a data curation process on a domain by domain basis, relying on human domain experts who use a variety of sophisticated tools to perform targeted curation on large data sets as well as to provide linguistic or grammatical support. Using those same tools, the domain expert can also specify that types of computations (in predefined formats) are possible within the domain using an existing ontology to ensure consistency and to allow computations via user entered queries. Once the domain expert has curated the data, that data is then added to a dedicated computational pod that operates on both the data and various expert-defined rules in order to return one or more answers based on queries sent to it by a language parser. Unfortunately, one potential weakness of typical computational engine based platforms is the absence of search query logs, which among other things, limits the ability of such platforms to determine user intent, relevance, ranking, and determine the domains and the appropriate data sources to curate.