Many tests require examinees to provide answers, or constructed responses, that include written words and essays or figural responses which can be scanned in as images. Other tests may require that examinees enter their responses in electronic format, using a computer application directly, such as the Computer Based Testing System disclosed in U.S. Pat. No. 5,565,316, assigned to Educational Testing Service and incorporated herein by reference. Automated computer-based systems have been developed to permit human evaluation of textual or figural responses on-line. However, other tests require review of responses in other, more complicated forms. For example, a test question, or prompt, could require an examinee to provide an oral response (Test of Spoken English, foreign language examinations, etc.) or to videotape a performance. Other test questions may require that an examinee create a diagram or drawing which is too complex for scanning to provide an appropriate representation for evaluation. The National Council of Architectural Registration Board (NCARB) administers a licensing exams for architects in which an examinee's response is created through a specially designed computer application and may have multiple overlapping layers. The analysis of the responses to the NCARB exam requires human evaluators to precisely measure each line and angle to determine the appropriate score for the examinee. Therefore, a drawing application is a more appropriate environment for presentation of the constructed response to the human evaluator.
A separate dedicated computer-based assessment system is required to permit human evaluation of these various constructed response types on-line. Thus, there exists a need for one assessment system to dynamically determine which computer application will provide the optimum presentation capabilities for constructed responses in a variety of forms. It is further desired for a single assessment system to automatically initialize the chosen computer application and to present the constructed response to the human evaluators through the chosen computer application.
Furthermore, the need to monitor human evaluators to assure accuracy of assessment has been recognized. Presently, this has been accomplished only through presentation of monitoring papers, which have a predetermined score associated with them, or repeated presentation of the same constructed responses to ensure consistency. This is inefficient since it requires that the human evaluators take time to review and assess constructed responses which do not really require scores. Furthermore, repeated presentation of the same constructed responses is frustrating to the evaluators and does not provide for accurate assessment. Thus, there further exists a need for an assessment system capable of evaluating and monitoring the human evaluators to guarantee consistency and accuracy of grading without utilizing constructed responses which do not need assessment and, thus, wasting time and other resources.
Finally, the need to minimize the influence of extraneous factors on a human evaluator's assessment has been well documented. For example, the time of day that a constructed response is presented to a human evaluator may influence the score awarded. Thus, safeguards are required to insure consistency and fairness when human evaluators are assessing constructed responses.
Test developers are also concerned with assessing the difficulty of test questions. To promote fairness, test questions presented to different examinees that are intended to be of the same difficulty should have highly consistent difficulty levels to prevent variations in difficulty of the test questions from affecting scores of the examinees.
Complex manual grading designs and methods have been used in the past to investigate the difficulty of test questions and the effect of outside influences on human evaluators. However, there exists a need for a computer-based assessment system which can be used as a tool in test and scoring development. There further exists a need for methods of presenting constructed responses to various human evaluators in a controlled manner so that the extraneous factors may be minimized. Finally, there exists a need for presenting constructed responses to human evaluators so that test question difficulty and human evaluator scoring may be assessed without the need for excessive repetition.