In spite of the drawbacks of multiple-choice testing format, well-recognized in the educational testing industry, multiple-choice questions remain a common way of testing students in a variety of subject areas, particularly in examinations taken by large numbers of students. The words “student,” “examinee” and “test-taker” are used in the context of educational testing synonymously hereinafter.
In its most commonly used form, a multiple-choice question comprises three identifiable sections: a section containing a set of facts to be presumed (for instance, a narrative, a short story, a poem, an expression, an equation, or a geometric figure), an interrogative sentence (sometimes known as the “call of the question”), and a set of answer choices.
A multiple-choice question may conveniently be divided into two parts—a first part, comprising a set of facts to be presumed and an interrogative sentence, and a second part, comprising a set of answer choices. The first part may also be termed a “query.” (The term “query” can alternatively refer to the interrogative sentence alone, but as used herein, the term “query” refers to the entire first part of the question (i.e., both the set of facts to be presumed and the interrogative sentence), unless otherwise noted. In the second part, between three and five answer choices are typically presented, although the number of answer choices may vary below three or above five under appropriate circumstances. (For instance, in a so-called “true/false question,” there are typically two answers: “true” and “false.”)
The set of facts to be presumed may be expressed using words or phrases, or a set of objects or symbols, or a combination of words, objects and symbols. Alternatively, the set of facts to be presumed may be expressed in any other appropriate way, such as with a figure, a picture, or another graphical representation. (For instance, in an art history exam, the set of facts to be presumed may constitute a piece of art or a picture thereof.) The interrogative sentence typically asks the student or examinee to pick the “correct,” or the “best,” answer, and to indicate the selected answer choice either on the exam paper, for instance, by circling the selected answer choice, or on a separate answer sheet. The separate answer sheet may include a small shape, such as a small circle, oval or rectangle, corresponding to each answer choice of each question, which shape may be filled in by the examinee, for instance with a pencil. Typically, the examinee is asked to fill in the shape corresponding to the selected answer, while leaving blank the shapes corresponding to the question's other answer choices. Other answer sheets may have any other appropriate configuration now known or later developed. In most cases, the examinee may leave a question unanswered, but may not select more than one answer choice per question. Thus, a multiple-choice question generally has no more than one valid answer. In other cases, where multiple valid answers exist, various answers may yield either full credit or varying amounts of partial credit, and methods disclosed herein may be extended in a recursive, analogous manner.
Indication of answers on a separate answer sheet is popular, being suitable for automatic or machine grading of the answer sheets; the automatic or machine grader compares a given answer sheet with the template of “correct” answers and counts the number of questions where the filled-in shapes (for instance) match the template. The examinee's score, also sometimes called the “grade,” may then be computed based on a formula that may depend on the number of questions answered correctly versus the number of questions answered incorrectly. Undoubtedly, the low cost, high speed, convenience and uniformity with which multiple-choice tests can be graded contribute to their popularity.
Implicit in the typical scoring formulas for multiple-choice tests is an assumption that examinees with a mastery of the subject matter will work efficiently to select the correct answers, whereas those who depend largely on guesswork will not do much better than the statistical odds of hitting the correct answers at random. However, in practice, the distinctions between the examinees' scores are rarely as clear-cut, due to quirks of the multiple-choice format. Specifically, an examinee unfamiliar with the tested material can beat the statistical obstacles to a high score by relying, at least in part, on guesswork—if the examinee is able to eliminate one or more incorrect answer choices and “guess” from a smaller pool of possible answers. On the other hand, an examinee who understands the tested material well may inadvertently choose an incorrect answer, despite that understanding, because of a minor error in analysis or computation. Such an error, no matter how minor, will often lead to a complete loss of credit for the question, if it results in selection of an incorrect answer choice. In short, the typical multiple-choice testing format provides little room for demonstrating the soundness of the underlying analysis or the accuracy of computation except in the final answer.
In a common scenario, total loss of credit may result from, for example, an incorrect answer choice being selected in lieu of the correct answer choice, based on the distinction of a single word, phrase, value or fact in the question (even though the examinee understood the question). For example, if a multiple-choice question asks an examinee to calculate the volume of a box having sides of 1 foot, 2 feet and 3 feet, the correct answer is the product of the three lengths, or 6 cubic feet. An examinee who understands the question may inadvertently choose the incorrect answer “9 cubic feet” if he or she misreads the “2” as a “3.” Thus, the very ease and simplicity of selecting and recording an answer to a multiple-choice question may obscure the difference between knowledge and ignorance on the examination.
The alternatives to multiple-choice tests that allow an examinee to demonstrate grasp and knowledge of the subject matter generally require “open format” answers to the questions, which are commonly in the form of short or long paragraphs or essays. Answers in this format may provide the scope to the examinee to include technical, subject-matter-specific language and depictions, such as chemical or mathematical formulas or equations, graphics, charts, tables and other symbolic constructs or representations as evidence of knowledge and understanding.
This type of “open format” testing provides ample room to the test-takers to demonstrate their command of the subject matter, but it is time consuming to take and to grade. Because of the time constraints, this format typically requires a smaller number of questions to be asked on an examination, and therefore, generally for a smaller subset of the subject matter to be tested. Furthermore, the grading of open format tests is often more expensive, subjective and non-uniform than the grading of multiple-choice tests. These disadvantages severely limit the use of such existing alternatives to “standardized” multiple-choice format examinations, especially in the context of large-scale, national examinations.
Thus, there exists a need for a test generation methodology that combines the breadth of scope, the uniformity, the efficiency and ease of grading of “objective” or “standardized” multiple-choice tests with the confidence and reliability of the “measures of knowledge and understanding” generally associated with the open format tests.