Questions (e.g., essay prompts) provided to examinees in an examination seek to measure the examinee's ability in a certain area of interest. For example, a question may seek to evaluate an examinee's knowledge level or may look to measure an examinee's ability to perform a certain skill, such as arguing persuasively. A score attributed to an examinee purports to give an indication of the examinee's ability level in the area of interest. But that score is only helpful if the question demanded use of that ability level in preparing a response.
Systems and methods as described herein automatically measure examination questions to see if they are actually testing the abilities that they were designed to test. These systems and methods evaluate millions of strings of English words (e.g. up to seven words) long across a number of initial responses to a number of candidate questions being evaluated. The sheer volume of comparisons and evaluations necessary to perform the described evaluations, make this process impossible for a human to effectively perform by hand.