The use of networked based social networks, for example, Facebook, Twitter, FourSquare, and Google+ has steadily increased along with the use of smartphones equipped with sensors and Internet connectivity capabilities. The marriage of these technologies, smartphones and social networks, will likely yield applications that leverage the data collection capabilities of large numbers of smartphones by applications such as crowdsourcing. For example, real-time traffic monitoring for Google maps is enabled through individuals sharing their location and speed information from their smartphones. This integration also leverages social networking applications for disaster management. For example, an oil spill or other environmental disaster can be monitored by individuals by sharing pictures or other relevant information across a social networking site. A chemical spill or the air quality around a given disaster can be monitored similarly using, for example, air sampling equipment associated with the mobile devices.
The information obtained through the use of these technologies can be aggregated, processed, and then consumed by individuals or by decision makers and public agencies. Consumption of this information includes information retrieval and responding to queries or questions over the obtained information. In information retrieval and natural language processing (NLP), question answering (QA) is the task of automatically answering a question posed in natural language. To find the answer to a question, a QA computer program uses either a pre-structured database or a collection of natural language documents, e.g., a text corpus such as the World Wide Web or some local collection. Search collections vary from small local document collections through internal organization documents and compiled newswire reports to the World Wide Web.
QA research attempts to deal with a wide range of question types including, for example, fact, list, definition, How, Why, hypothetical, semantically constrained, and cross-lingual questions. In general, QA is dependent on having a good search corpus, i.e., the existence of documents containing the desired answer. Therefore, larger collection sizes correlate to better QA performance, unless the question domain is orthogonal to the collection. The notion of data redundancy in massive collections, such as the Web, creates a situation where nuggets of information are phrased in many different ways in differing contexts and documents. This yields two benefits. First, the burden on the QA system to perform complex NLP techniques to understand the text is lessened by having the right information appear in many forms. Second, correct answers can be filtered from false positives by relying on the correct answer to appear more times in the documents than instances of incorrect answers.
Closed-domain question answering deals with questions under a specific domain, e.g., medicine or automotive maintenance, and presents an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies. Alternatively, closed-domain might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. Open-domain question answering deals with questions about nearly anything and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.