Unstructured text is an important source of information in question and answering systems for the reason that information extracted from the unstructured text is commonly utilized to answer a posed question. Because the quality of the question and answer system depends directly on the quality of its answers, understanding unstructured data and extracting as much information as possible from it is crucial to system performance. Chief to this process is the corpus itself containing the unstructured text. While corpora of higher quality yield higher quality answers, determining the quality of a corpus before run-time is not easy, as manual annotations are a labor intensive task.