The present disclosure relates to query expansion in information retrieval, and more specifically, to query expansion using a graph of question and answer vocabulary.
Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing, including semantic labeling or other metadata not found directly in the text.
An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example, search strings in search engines. In information retrieval, a query does not uniquely identify a single object in the collection. Instead, several objects may match the query with different degrees of relevancy.
Query expansion is essential in information retrieval systems. A user should not be expected to know the exact content of documents they hope to retrieve via search. By augmenting a query with additional related terms, the likelihood that a relevant document is retrieved will be increased.
Query expansion is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In search engines, query expansion involves evaluating a users input (what words were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional documents.
Known query expansion involves techniques such as: finding synonyms of words and including these in the search query, finding different morphological forms of the words in the search query, correcting spelling errors, etc. Additional metadata such as related topics of interest or semantic labeling can be used to improve the likelihood that relevant documents are returned, even when a direct lexical or word match is not possible. For example, if words describing cats and dogs are all labeled with the tag “Animal”, and a user query contains such vocabulary and is tagged similarly, it may help to filter the set of search results in system output to those that are tagged as containing an “Animal” reference, rather than depending on a direct reference of the specific word in the question.
Predicting the content of a document that a user's query should point to is far from trivial. Simply adding additional terms to the query may exacerbate the retrieval problem.
In some situations, a user's query text does not match that of the expected answer. This is particularly exacerbated in domain or task specific searches, such as legal, financial, or biomedical question answering, or in any domain where there is a rich domain specific vocabulary that does not overlap readily with common vocabulary that a typical user may try to search with.