The rapid development of science and technology has led to a rapidly increasing amount of published information. Question answering (QA) systems have been designed to access and search through such information to automatically answer questions posed by humans in a natural language.
One of the major challenges in such QA systems is to provide relevant answers amid multifarious search results. Search engines often return a large set of search results that are irrelevant to the question, causing the user to be confused and lost in the myriad of results. Even the top ranked search result may not be related to the question itself. This is especially prevalent in cases where the question is short and includes common words with spellings that are very similar to names or content of other different topics, such as the name of a film or lyrics of a popular song.
There are two common causes of retrieving irrelevant answers. Firstly, irrelevancy may be caused by low accuracy in question parsing and text analysis. In other words, the QA system may not correctly interpret the meaning of the input question. Irrelevancy in answers may also be caused by low accuracy in the answer finding capability of the QA system.
In addition, answers tend to be limited in domain-specific QA systems. Most QA systems support only a certain kind of domain, and do not support access to a wide collection of knowledge bases. Frequently, no answer is provided for questions with answers from an unsupported domain. Even if answers can be found, they may not be comprehensible by the user. In most cases, the answers may be very long with many definitions, principle introductions, references, related topics, etc. that make it difficult for the user, particularly a school-age child, to quickly understand.
Even further, the performance of conventional QA systems is typically unsatisfactory. In order to correctly interpret human language and extract answers from large knowledge bases, QA systems often employ a wide data search and deep data mining that are computationally expensive and often result in slow retrieval time and low accuracy.
Therefore, there is a need for an improved framework that addresses the above-mentioned challenges.