The present disclosure relates generally to question answering systems used to generate candidate answers, and more specifically, to candidate answer generation that utilizes a heterogeneous collection of structured, semi-structured, and unstructured information resources.
Most question answering (QA) systems suffer from two significant deficiencies. First, the systems rely on the question analysis component correctly identifying the semantic type of the answer and the named entity recognizer correctly identifying the correct answer as that semantic type. Failure at either stage produces an error from which the system cannot recover.
Second, most QA systems are not amenable to questions without answer types, such as “What was the Parthenon converted into in 1460?” For such questions, oftentimes all noun phrases from the search output are extracted, leading to a large number of extraneous and at times non-sensible candidate answers in the context of the question.