Question generation and answering are disciplines within computer science focused on building electronic data systems capable of providing natural language answers to natural language questions. For example, a data system may be configured to answer automatically the natural language question, “How many pints are in a gallon?” with a natural language answer of, “There are eight pints in one gallon.” In the above example, both the question and the answer are presented in a format that a human speaker would use to ask and to answer the question, thereby making the answer easily understandable by the person asking the question.
In preparing a data system for natural language question answering a database of natural language questions and answers is generated. This process is referred to as data collection. Data collection typically involves machine learning methods and requires a certain amount of task-relevant data for training and testing purposes. A common data collection solution is to collect manually the data. For example, crowdsourcing is a typical way to collect manually data via online collaboration of many people. However, crowdsourcing is time consuming and sometimes it is hard to get data with good quality if the people who collect the data are not experts in the pertinent subject matter. Moreover, each time a data system directed to a different subject matter (i.e. a different domain) is desired, additional data is collected and the questions and answers must be generated again. Furthermore, the questions and answers of the data system are typically limited to a specific format, syntax, and organization.
Question and answering data systems have the potential to simplify human interaction with electronic machines. However, known methods and systems for building question and answering data systems are labor intensive and time consuming. For at least these reasons, further developments in the area of question and answering data systems are desired.