The present disclosure relates generally to a question answering computer system, and more specifically, to a context based approach to passage retrieval and scoring in a question answering computer system.
An information retrieval computer system typically receives a query, identifies keywords in the query, searches documents for the keywords, and ranks results of the searching to identify best matches. Some information retrieval computer systems output a list of best matching results to a user, such that the user can then attempt to determine if desired information can be found in the results. Keyword searching often uses frequency-based scoring for words or synonyms, but such searches typically fail to consider the context of particular words. More advanced question answering computer systems typically employ natural-language processing (NLP) that returns a highest scoring answer to a question in a natural language format. NLP techniques, which are also referred to as text analytics, infer the meaning of terms and phrases by analyzing their syntax, context, and usage patterns.
Human language is so complex, variable (there are many different ways to express the same meaning), and polysemous (the same word or phrase may mean many things in different contexts) that NLP presents an enormous technical challenge. Decades of research have led to many specialized techniques each operating on language at different levels and on different isolated aspects of the language understanding task. These techniques include, for example, shallow parsing, deep parsing, information extraction, word-sense disambiguation, latent semantic analysis, textual entailment, and co-reference resolution. None of these techniques is perfect or complete in their ability to decipher the intended meaning. Unlike programming languages, human languages are not formal mathematical constructs. Given the highly contextual and implicit nature of language, humans themselves often disagree about the intended meaning of any given expression.
A question answering computer system can use a primary search to retrieve documents, passages and other types of information (from both structured, e.g., a knowledgebase, and unstructured sources), with respect to a query formulated from a given question, which are later used for candidate answer generation. Candidate answers can then be evaluated with respect to candidate passage evidence that supports or refutes the candidate answer. Contemporary passage scorers use various techniques to judge candidate passages independently of each other, including methods based on surface similarity (i.e. textual alignment) with the question, logical form alignment, structural similarity based on syntactic-semantic graphs, various linguistic features, etc.