The present invention is directed to a system and a method for selecting, delivering, conducting, editing, scoring, and reporting on standardized achievement tests.
The automation of test scoring is a complex problem that has been of interest for many years. There is considerable pressure to optimize the efficiency, accuracy, speed and the repetitiveness and therefore the reliability of such test scoring, data accumulation and reporting. Of primary interest has been the scoring of multiple choice answer sheets. Of further interest has been the recording and reporting of test results.
Beginning in about the late 1960's, technology has been developed to machine score optically-scanned answer documents. Hardware has improved throughout the years. However, the basic testing approach has remained reasonably constant. Students/examinees respond to multiple choice questions by completely filling in “bubbles” on a machine scannable answer sheets using a pencil or a pen. A “bubble” is a predetermined outlined round, square, or oval location on the answer sheet designated for an answer selection. When the answer sheet is scanned, the hardware (scanner) identifies the dark mark, with either a pencil or pen scanner head, and an answer for each question, and electronically stores the student's responses.
Scanners and computer hardware have become more affordable over the years. Optimal mark reading (OMR) systems are well known in the art, including those used for scanning forms having pencil marks within a preprinted target area, such as circles, squares, or ovals. OMR systems can sense data recorded within the printed areas by detecting light absorption, usually in the near infrared range (NIR). This is NIR scanning. This method permits the differentiation of darkened pencil/pen marks from preprinted material on an answer form, as the preprinted material generally is provided in a pigmented color which does not absorb the NIR light. Such OMR scanners therefore permit the gathering of answer data that can be converted into digital form, scored against an answer data base and have the scores saved in storage associated with the test person's personal identification data. The scanning and the scoring of answers is conducted under the direction of specialized software. In the past, two of the most commonly used software packages were SCANTOOLS, provided by National Computer Systems (Minneapolis, Minn.) and BASIC SCRIPT, provided by Scantron Corp. (Tustin, Calif.).
Testing for the purposes of evaluation of achievement, and for evaluating specific achievement in one or more targeted (special) areas of knowledge has utilized multiple choice testing where answers are recorded on bubble sheets. The automated scoring of bubble sheets and thereafter the statistical and other processing of test results has become the focus of much research and development.
A test is typically divided into sections of questions. The test specification generally defines the number of items to be presented in a test, the number of test sections, the number of questions in each section, the allotted time for responding to all items in each test section, and the time for taking the test.
Under the stress of such conditions, certain “irregularities” can arise on a test answer sheet. Among these are a failure of a student/examinee to enter identification data properly or to leave out identification data, such a full name, school identification, class identification, teacher name, date, and other such data and/or to miss spell any of these. Moreover, with large numbers of rows and columns of “bubbles” for answer selection, a student/examinee may miss-apply an answer, or fill-in more than one choice, or even incompletely fill-in a bubble to the point where the OMR/NIR equipment misses an answer where the student made an insufficient mark where he intended it to be an answer. Or a student may erase an answer and choose another answer whereof the erasure is insufficient to provide a single choice in a multiple choice line. In the mechanized scoring of test sheets, not only is competent scoring desired, but also the full and proper student/examinee identification data is required for the post-grading statistical manipulation and analysis of scores and reporting.
The speedy resolution of these factors becomes more important where standardized tests are used frequently throughout a school year as a feedback tool for both the teacher/administrator and the student in order to provide student achievement assessment. In such instances the scoring and reporting functions must be carried out in a reasonable time period in relationship to the student's continuing lessen plan. Therefore, a rapid test turn around time is desirable.
In the mechanized scoring of standardize tests in the past, non-academic errors (i.e., those other than answering questions) would have rendered the test sheet unreadable and would have either voided the student's performance or required the scoring organization to hand search for the student's test paper and then hand grade the test, or to force the student to retake the exam.
Similar problems usually do not arise with the on-line administration of tests, either though a local area network (LAN) or via the internet. However, this on-line testing requires instructional/testing systems available at workstations for each student or examinee. Some examples of instructional programs which included multiple choice achievement testing have included those available from Computer Curriculum Corp., from Computer Networking Specialists, Inc, from New Century Education, from Unisys-ICOPN System, from Wasatch Education System and from Wicat Systems. Educational Testing Service has also developed a computer-based testing system, comprised of a test document creation system and an administrative system for initiating and terminating the delivery of a computerized test to an examinee. Those systems like the Educational Testing Service system have focused on the prevention of student cheating, which by way of example may be implemented by randomizing the test question order for each workstation.
A latent problem with machine testing is the unavailability of sufficient numbers of workstations so that each student/examinee has a workstation available at the same time. In the educational environment, where a school district administers standardized tests to large numbers of students at the same time, on-line testing becomes reasonably impractical. Where the groups are small, such as governmental and corporate testing, or very specialized small classes, on-line workstation testing is feasible and even desirable.
Several institutions and corporations have developed various methods for administering tests and various methods for the automation of the scoring process and/or the automated administration of the human scoring process in an effort to achieve human standardization. Among these is National Computer Systems, Inc., Eden Prairie, Minn. (“NCS”). NCS has developed a computerized administration system for monitoring the performance of a group of individuals (resolvers) grading open ended (non-multiple choice) portions of the same test. The NCS system scans student tests and then presents the tests to scoring individuals over a LAN system which monitors the work performance of each scorer. The NCS system, in real-time, compares the production, decision making, and work flow of the scoring individuals and then provides feedback and an on-line scoring guides to the individual scorers, as well as adjusts their work volume and work breaks. The NCS system, even while encompassing real-time prompting of its scoring individuals, does not generate fast turn around nor quasi-fast turn around scoring of the students tests. The reason the NCS system operates in delayed-time turn around is because it utilizes humans to examine, to analyze, to make decisions, and then to score each test.
NCS has also developed a computerized distribution system for optically scanned essay answers and storing “batches” of test answers for off-line scoring of batches. A “batch” is a grouping of tests for storage location purposes and identification purposes. The NCS system is also used for training and qualifying human scorers. Real or “live” test answers are distributed to scorer workstations though a LAN system. The production operation of a plurality of human scorers, each scoring an assigned batch of test answers, is managed by monitoring work volume and work flow and allocating work load. Computer security is provided for all data for test scores, and for file access.
Educational Testing Service, Princeton, N.J. (“ETS”), which is well known for generating and scoring academic skills tests (e.g., SATs, LSATs, GMATs etc.), has developed a LAN based workstation system for human evaluators which control the presentation of constructed responses (open-ended essay portions of a test) to minimize the influence of psychometric factors on the accuracy of the human evaluators. The performance of human evaluators to test questions (scoring answers to test questions) is monitored and evaluated against a performance guideline database to assure consistency of performance from each evaluator. Further, ETS has developed a system for on-line essay evaluation. The system manages the work distribution to human evaluators and work flow including the real-time on-line testing period.
Along with this, ETS has developed a computerized test development tool for the monitoring and the evaluation of both its human evaluators and the proposed essay test questions to which the examinees are to be presented in test taking. Responses to proposed questions are constructed by research scientists and are categorized based on descriptive characteristics indicating the subject matter of interest. The constructed answers are presented to the human evaluators working at individual workstations and their score is assembled into a database for later evaluation by the test developers for the appropriateness of the test questions and the ability of the human evaluators to score answers.
In its development of the questions for standardized tests, ETS has also developed development tools, i.e., systems, to assist in developing rebuics for use in computerized machine scoring of essay answers. The user of the development system is usually a test analyst working at a workstation. The test analyst or researcher selects from a list a plurality of questions with answers to be scored. Four scoring modes are provided: interactive, continuous, alarm and sample scoring. In the interactive mode, the researcher checks the machine's performance on an item-by-item basis, where an item is an answer scored. The researcher can accept the score, change one or more feature scores, change the overall item score, or change the rebuics for the item.
In the continuous scoring mode, the computer scores all of the selected items and stores the scores in a database. The continuous mode is used after the researcher is satisfied from the interactive mode that the scoring rebuics are correct for all items (all answers) scored. In the alarm mode, the computer alarms an irregular condition wherein the researcher may perform any of the activities of the interactive mode, i.e., accept the score, change one or more feature scores, change the overall item score, or change the scoring rebuics.
In order to avoid examinee identification errors, ETS has developed a bar code to be assigned to each examinee for each test. The bar code label appears on the face of the bubble sheet.
ETS has also developed a system for producing a computerized test, delivering it to an examinee at a workstation and recording examinee responses to questions presented during the delivery of the test. The system provides for operator input to create a digital record of each question for a test and then assembling a test package into a predetermined examinee screen presentation. This ETS system cannot be interfaced with the internet to operate in another mode of testing. An administration portion of the system controls the initiating and terminating of the delivery of the test (the time for the test) to the examinee workstation. Interactive software responds to examinee key prompts to present, or represent, examinee desired portions (pages or specific questions) of the test on the examinee's screen. Examinee's responses to questions are stored locally at the workstation. The examinee performance is evaluated after the testing period ends. A data portion holds examinee performance files, security log files, error log files which have been generated by the ETS system. A report is generated from the data retrieved from the data portion to generate a report including any of the following system administrative information: activity, audit trail, daily processing control, exception, security/event log, and essay. Test score data is stored against the examinee log-on data and is reportable for each examinee. The system also automatically checks for viruses.
Other developers in this field have been Uniscore Inc., formerly Meadowbrook Industries, Delran, N.J., which has also developed a computerized teaching and performance testing tool for the human scorers of essay test answers.
Harcourt Assessment, Inc, formerly The Psychological Corporation, San Antonio, Tex., has developed a computerized scanning and storing system for human scorers scoring essay answers. This system also scores multiple choice bubble answers against a reference data base. Timing marks, i.e., OMR (optical mark recognition), are used to align each test answer sheet scanned. Sheets improperly aligned are rejected and rescanned. OCR (optical character recognition) scanning of each essay or short answer is performed and distributed to a human reader, i.e., a scorer, for scoring.
Bookette Software Company, Monterey, Calif., has developed a computerized optical scanning system for scanning bubble sheets of multiple choice test answers, electronically scoring, and then reporting test results. The system employs templates which contain computer readable images of test questions (question identification), overlay records which contain the coordinates of icons representing the possible answers, and the identification of the correct answers. The reporting takes the form of presenting the test document template at a workstation screen, along with the scanned and scored responses from a selected student, and with an overlay of circles around the correct answers for which the student got wrong. A paper printout is also available.
While these prior developments have advanced the art of automated test construction, automated test administering, automated test question development, computerized bubble answer scoring, and computer-aided human essay scorer performance, these prior developments have not, in whole or in combination, addressed a unified web-based system for the delivery, scoring, and the reporting on-line, of on-line and paper based assessments. Such a new system is multi-functional and multi-modal and would permit the processing of large masses of assessment tests.
With the exception of the previously used on-line, real-time testing and scoring, the prior developments have not addressed significantly increasing the speed of scoring and reporting test results, whereby very large numbers of tests can be scored and reported on in very short periods of time with minimal human intervention.
Contrary to the direction of the present invention, speed enhancements achieved in the prior developments have arisen out of faster scanning machines, the better training for human evaluators, and work volume management of human evaluators.
What is desired is a new development, which would eliminate discrepancies scoring assessment tests which previously have arisen because of the human factors, such as non-standardization, human errors, deviations in judgment, fatigue and boredom, and which new development would also reduce the editing time of each human editor.
What is also desired is an automated system for on-line reporting of test results from plural types of sources and for plural types of test medium.
What is further desired is an automated system, which is human interfaced, where the through-put time in scanning, validating, scoring and reporting of each test is greatly and significantly reduced whereby the turn-around time of scoring and reporting on a test is minimal, thereby providing the educator and the student almost immediate useful test results and thereby a feedback on a student's achievement and test performance.
What is even further desired is such a new development which is web-based and whereof scanned test record is computer analyzed and a human editor of scanned records is computer prompted for making editing corrections to a record.