1. Field of the Invention
A method for grading free response tests which is easy to use, operates at very high levels of accuracy, is cost-effective, reliable, easy to teach to temporary readers/graders, and permits each reader/grader to act independently of any fixed facilities during the grading process. The key operational elements in the method comprise the use of highly accurate machine readable data codes to uniquely associate test-taker, test and reader/grader, and a portable sensing device having a memory for storing codes read by the device for subsequent entry into a host computer. The method permits multiple readers/graders to evaluate the same test without one reader/grader influencing another, while reducing paper handling and key entry of data inherent in large volume paper and pencil testing techniques.
2. Description of the Prior Art
The task of properly administering the taking of tests by large groups of test-takers in the United States presents many logistical and procedural problems. With respect to free response examinations alone, several times a year, hundreds of thousands of students are tested, requiring, on a temporary basis: locating, hiring, training and supervising large numbers of clerical and grading personnel; obtaining suitable temporary test sites and preparing the test site for testing and grading requirements; distributing and collecting massive quantities of testing and grading materials; insuring proper identification and association of examinee, test, grade and grade report; analysis of the grades for particular questions so as to insure the grading is proper; and reporting the grade to the examinee and/or others within a limited amount of time.
By way of example, Educational Testing Service ("ETS"), the inventor's assignee, administers the taking, grading and reporting of essay examinations taken each Spring by approximately 450,000 students in order to qualify for college credit based on high school advanced placement work. In administering these examinations, ETS hires and trains hundreds of clerical aides and readers/graders, obtains suitable test sites across the country and installs large quantities of equipment at these test sites. Additionally, ETS must move and keep track of literally millions of pieces of paper, before, during and after the examination.
Importantly, effective grading requires that the reader/grader be allowed to perform his/her reading/grading with minimal distraction, either from clerical duties related to grading or from prior grading of the test. It is important that the grading be objective, that an essay containing one or more questions and/or graded by more than one reader is graded in a manner that prevents a subsequent reader from knowing the grade(s) awarded this essay by prior readers and consciously or unconsciously being influenced by the prior grade--a phenomenon known as the "halo effect".
Prior art methods generally attempted to insure such objectivity by providing methods of concealment of the previous grader's written scores; for example, the use of "band-aid" like shields, grade encoding and the use of "invisible ink". However, these methods added to the complexity and labor-intensive nature of the testing process and thereby caused new and/or additional problems and increased process costs. Importantly, any increase for the grader in the complexity of the grading process will generally have a negative impact on the grader's efficiency and objectivity.
Kaney, U.S. Pat. No. 4,478,584, discloses a method for maintaining the independence of ratings by multiple evaluators using an attached multicomponent rating shield, where one component of the shield is removed by the reader/grader to mask the grade, another component is removed in order to unmask the grade and the remaining component has the ability to permit the grade to be machine scanned through it.
Another method uses "band-aid" like paper strips which are pasted over each score written on an essay's grading sheet before it is distributed to each subsequent reader. After an essay has been completely scored, all the paper strips are cut away to reveal the scores for key entry operators. Though this process has been used for many years, it requires that each grade but the last be concealed clerically, thereby introducing an additional process step with respect to every score but the last on each essay.
A representative ETS advanced placement test, the Biology examination, is presently administered to about 35,000 test-takers during the Spring. A single Biology essay may contain 4 questions. Employing the "band-aid" technique to cover the grades for the first 3 questions requires 105,000 paper strips and the clerical manpower to apply them. In the largest essay-based program currently administered by ETS, approximately 1,500,000 (1.5 million) paper "band-aids" are applied and removed in this manner during a few weeks in June each year. While the "band-aid" procedure addresses the problem of grader independence when properly used, it adds a process step and increases the cost and time involved in completing the examination process.
In order to insure independence without the need to cover the scores, grade encoding is sometimes employed. In one common embodiment of this method, each table (working unit) of readers/graders is assigned a different set of alphabetic codes to substitute for the range of numeric scores to be awarded. For readers/graders at a table a score of 4 might be encoded as "R", for other readers/graders working at another, as "B". These encoded scores are subsequently computer processed and reconverted to numerics using a conversion table containing each reader's identification number, the range of encoded scores and their numeric equivalents. This method increases the complexity of the grading process for each question for the administrators and the readers/graders and introduces the potential for error. Moreover, even though the grades are executed, and the readers/grades are instructed to maintain the confidentiality of the codes, there is nothing to actually prevent disclosure, for example, during conversations among readers, and defeat the independence of the reading/grading process.
Another method of concealment is the use of ultra-violet light-sensitive "invisible" ink to write the scores. The "invisible" scores are revealed later to data entry operators by illuminating the documents with ultra-violet lamps. This method has raised questions not only as to its safety, but also as to whether the "invisible" scores are really invisible. It appears that the "invisible ink" employed in the readers'/graders' felt-tip pens can be faintly discerned in certain light conditions. In any case, as with the "band-aid" and grade encoding methods, this method also employs the clerically-intensive methodology of key data entry and grading sheets.
While it has been suggested that optical scanners could be used to collect student grades after the grades have been set, the suggestions are directed to the mere entry of raw data and not to obtaining a better quality of grade. The process of the present invention enhances the grade quality by using, in part, the sensing device to eliminate the influence of a prior reading/grading on a subsequent reading/grading.
The inherent centralization of document scanning and/or key entry sites in the prior art methods requires the organization and transportation of source documents to and from reading sites. This requirement increases costs while introducing frustration and delay at critical points in the process for both the grading staff and data entry personnel. One additional expense arises out of the requirement that each essay answer book must also be designed and printed with an essay answer grading sheet as an appendage to its back. The requirement is necessary in order to reduce the potential for error as well as the amount of clerical work associated with keeping two separate documents involving one test-taker together through an extensive hand grading process. Another additional expense and a particular obstacle to the establishment of multiple remote reading sites is the quality control accountability requirement that these grading sheets, which are transported to a central data entry location for key entry of data, undergo counting and batching operations at the central location.
The time required by the data entry operations also increases the delay in providing management information to administrative personnel at the reading sites. Information, such as how long it is taking readers/graders to grade a particular question and how consistent the readers/graders are in grading a particular question, is needed to measure the progress of the reading in order to make any necessary adjustment in resources. For example, if the grading of one question is taking longer than another question, the number of readers/graders assigned to grade each question can be adjusted so that the entire grading procedure is completed within the time limitations and before the readers/graders are scheduled to depart. Generally, management information is contained in the documents in transit and typically only becomes available in summary form the following morning. Elaborate predictive measures are utilized to gain an understanding of progress, but there remains a need for factual, current information during the reading in time to influence the reading itself.
Many essay testing programs have instituted a requirement that each question must be read twice and those two scores are compared to assure that if they differ, the difference is within a predefined range. Scores that differ by too broad a range are graded again to resolve the "score discrepancy". There are special processing problems inherent in this requirement, most relating to speed of the identification of the discrepancies. Discrepancies should be identified as soon as possible, so that any necessary third grading may be performed while the temporary reading site is still in operation and the temporary staff of high school teachers and college professors hired for the reading task is still available.
To identify discrepancies utilizing a method which relies on grading sheets separate from the essays for the collection of scores means that a discrepant grading sheet used to identify the discrepancy may be available, but the original essay may not and so cannot be immediately regraded. Most methods provide no way to locate individual essays within the thousands moving through the pipeline until the reading is completed and a clerical essay sort can be performed, providing a basis for retrieval.
One recently developed procedure to address this problem employs several key-entry personnel and on-site personal computers to more quickly identify discrepancies. This procedure requires the additional process step of keying each essay's reader identification numbers and respective encoded scores into a digital computer programmed to perform score decoding and to compare the two resulting scores. This process also provides somewhat more timely management information by generating summaries of the essays so processed. However, because the representative data which is key entered on-site is not the actual data used in score reporting, the discrepancy identification process is not conclusive. The handwritten data which is key entered on site is not the actual data which is encoded as machine readable pencil-darkened ovals on the grading sheet and is used in the eventual grade report. Therefore, these two versions of a reader's score can differ, creating another rejection and requiring additional clerical resolution and possible regrading of the question.
Many low volume testing programs have concluded it is simply easier to conduct a make-up reading a week or two later, in which all the discrepancies are resolved. Of course, this adds to the expense of the process and introduces weeks of delay into the reporting of scores to test-takers. The ideal situation would provide for rapid identification of all discrepancies while readers were still available and would utilize real data with immediate location of the discrepant essay.
There is an ongoing need for the leader (table leader) of reading groups to monitor each participant's consistency of adherence to predefined scoring standards. However, it is difficult to establish methods for collecting data on the consistency and reading rate of individual readers through paper and pencil techniques. When such information is collected it is typically via special studies based on timed observations of the reading, employing sampling techniques. Prior to the present invention, there have been no techniques available which conveniently provide more equitable information by routinely collecting performance information on every reader's score.
Additionally, paper and pencil techniques rely on the physical movement and control of grading sheets, requiring the creation of ancillary documents such as transmittal forms, control forms, paper "band-aids", work orders, etc. Each of these documents must be batched, moved, counted, controlled, and filed, requiring staffed mini-systems to assure their proper handling and thus increasing project overhead.
However, there has been a growing shortage of qualified personnel available to work on the kinds of part-time assignments typified by essay readings. ETS' largest essay grading program currently requires the hiring of approximately 325 such personnel for several weeks in June each year and it is becoming increasingly difficult to locate them. This program has been growing at a steady rate and the need for clerical aides will continue to grow in a parallel fashion if clerically-intensive paper-based methods are unchanged.
Any solution proposed for the problems associated with free response grading must offer ease of use, objectivity, very high levels of accuracy and consistency, affordability for large and small programs alike, reliability, and cost-effectiveness. Additionally, the "solution" should not increase score reporting turnaround time, the number of steps required to process the test, or the workload and training time for readers/graders. In the past, these requirements have created serious obstacles because of the cost and complexity of the numerous individual method components generally required by the solution.
Other methods have not been effective in operating in temporary situations, where important considerations include site location, equipment portability, ease of setup and takedown (minimal cabling), telecommunications, and equipment security.