In the field of artificially intelligent computer systems capable of answering questions posed in natural language, cognitive question answering (QA) systems (such as the IBM Watson™ artificially intelligent computer system or and other natural language question answering systems) process questions posed in natural language to determine answers and associated confidence scores based on knowledge acquired by the QA system. To train such QA systems, a subject matter expert (SME) presents ground truth data in the form of question-answer-passage (QAP) triplets or answer keys to a machine learning algorithm. Typically derived from fact statements submissions to the QA system, such ground truth data is expensive and difficult to collect. Conventional approaches for collecting ground truth data might include a user to be trained on a specific ground truth collection application in which documents are pre-loaded before a question is presented, or the flow may be reversed so that a question is created first, followed by document loading in the application. Thus, while there are a variety of existing ground truth tools, each operates in a different way, often requiring dedicated software and separate training which imposes costs on the ground truth collection process. As a result, the existing solutions for efficiently generating ground truth data are extremely difficult at a practical level.