A recent trend in testing emphasizes a move beyond traditional multiple choice tests in favor of tests that require open-ended responses such as essay responses. These open-ended responses are often referred to as constructed responses (CRs). However, CRs are not limited to written text, but may include graphics, videotaped performances, audio responses, as well as other forms of responses. In order to improve the efficiency of scoring large scale standardized tests, both those offered at periodic administrations as well as those offered essentially on a daily basis, computer systems have been developed to automatically score multiple-choice responses and other simple response types. While some automatic scoring systems have been designed to score particular types of CRs (see e.g., U.S. patent application Ser. No. 08/794,498, entitled Automatic Scoring for Categorized Figural Responses, assigned to the same assignee hereunder), the evaluation of CRs is particularly well-suited to human raters. For this reason, certain computer scoring systems have been developed to facilitate and automate the electronic transmission of CRs to human raters for evaluation and scoring. However, these conventional computer scoring systems currently have many disadvantages.
Conventional computer scoring systems generally include a centralized processor such as a server or mainframe computer and terminals or workstations (hereinafter "rater stations") interfaced with the centralized processor. The centralized processor is also interfaced with a data storage device in which CRs in electronic form are stored. The CRs may be scanned images or ASCII text, for example. The centralized processor transmits one CR at a time to a rater station for scoring by a rater or scorer operating at the rater station. The rater enters a score via the rater station. The score is typically transmitted back to the centralized processor for storage in the data storage device.
However, the distribution of CRs one-at-a-time often results in noticeable delay between the request for the CRs and receipt of the CRs at the rater stations for scoring, and therefore, does not maximize the raters' time for scoring responses. Moreover, once a score is entered by the rater and committed, i.e., transmitted back to the server and recorded in the storage device, the rater has no opportunity thereafter to review the CR or modify the score awarded. This disadvantage may be significant when a rater begins a scoring session and after scoring a number of CRs decides that the earlier scores awarded were either too lenient or too harsh. Thus, there exists a need for a scoring system that efficiently transmits a plurality of CRs to a rater station such that the rater may review both the CRs and scores awarded in any order before committing the scores to the scoring system.
Furthermore, there exists a recognized need to properly train and monitor human raters to assure that the scoring is accurate and reliable. In computer scoring systems, monitoring performance has been accomplished through the presentation of monitoring CRs, i.e., CRs which are associated with predetermined or qualified scores. Such CRs may also be used for training purposes to help new raters learn how to assess CRs in accordance with certain scoring criteria. CRs having known scores associated with them may also be useful in assisting more experienced raters calibrate their scoring prior to scoring CRs during a scoring session. Other CRs with known scores may be useful in certifying new raters to qualify them prior to scoring.
In addition, benchmark CRs and rangefinder CRs which exemplify CRs having a particular known score have been used in manual scoring assessments to guide raters in identifying CRs warranting particular scores. Benchmark CRs may specifically be used across test administrations to define the score points. Rangefinder CRs are typically shown to raters as a set so that the rater may practice scoring a CR with a given score.
Conventional computer scoring systems do not transmit benchmark CRs or rangefinder CRs to rater stations for use by raters during scoring. Moreover, conventional scoring systems do not have the capability to select CRs for different uses, e.g., training, calibration, certification, etc. Thus, there further exists a need for computer scoring systems that provide an efficient means for selecting and distributing various types of CRs to raters based on their experience.
Furthermore, conventional computer scoring systems have limited gathering and reporting capabilities. Typically, in prior art scoring systems, the only type of data that is available is that collected for an individual scorer. Further, the statistical information on reader performance is collected only for specified scoring sessions as opposed to continuously from the time the reader becomes certified. Also, because most prior art systems are designed around one centralized processor, often there is a delay between when a rater or scorer takes an action and when the action will be included in a report. Thus, statistics are not available in "real time." Furthermore, prior art scoring systems do not gather and report statistics on the wide variety of transactions for which it would be useful to have such statistics. For example, it would be useful and a great advancement in the art to be able to generate a report regarding scoring on a particular test or topic area, or statistics generated for a particular scoring center, or system wide statistics involving all scoring centers. Thus, there exists a need for a scoring system that provides immediate and in depth data reporting on scorers, site and system wide transactions. The present invention addresses these many needs in the art.