Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (“natural”) languages. As such, NLP is related to the area of human-computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. NLP is used in question answering (QA) systems, such as in IBM's Watson™ computer system which is an artificially intelligent computer system that is capable of answering questions posed in natural language. Because of the use of NLP in QA systems, such as Watson™, correctly processing natural language is quite important. Mistakes in performing NLP can lead to incorrect answers output from the QA system.
QA systems often employ NLP concept detection annotators. These NLP concept detection annotators are never perfect. When customers discover a problem with the annotators, there is often a significant lag between spotting and reporting a problem and ultimately receiving a correction to the problem delivered in a deployment update by the NLP developers. The problem is exacerbated when an NLP model has already been deemed “acceptable” and the NLP development team has moved on to delivering/supporting other NLP functions. Given the high rate of defects in NLP-derived data, combined with the critical importance of data accuracy to the QA system, such defects pose a significant challenge to scaling the QA system to new ventures. This is because it is difficult for NLP developers to focus on such new ventures if they need to expend significant time and resources to correcting problems in existing NLP models.