This invention generally relates to the field of computer-based test scoring systems, and more particularly, to automatic essay scoring systems.
For many years, standardized tests have been administered to examinees for various reasons such as for educational testing or for evaluating particular skills. For instance, academic skills tests, e.g., SATs, LSATs, GMATs, etc., are typically administered to a large number of students. Results of these tests are used by colleges, universities and other educational institutions as a factor in determining whether an examinee should be admitted to study at that particular institution. Other standardized testing is carried out to determine whether or not an individual has attained a specified level of knowledge, or mastery, of a given subject. Such testing is referred to as mastery testing, e.g., achievement tests offered to students in a variety of subjects, and the results are used for college credit in such subjects.
Many of these standardized tests have essay sections. These essay portions of an exam typically require human graders to read the wholly unique essay answers. As one might expect, essay grading requires a significant number of work-hours, especially compared to machine-graded multiple choice questions. Essay questions, however, often provide a more well-rounded assessment of a particular test taker""s abilities. It is, therefore, desirable to provide a computer-based automatic scoring system.
Typically, graders grade essays based on scoring rubrics, i.e., descriptions of essay quality or writing competency at each score level. For example, the scoring guide for a scoring range from 0 to 6 specifically states that a xe2x80x9c6xe2x80x9d essay xe2x80x9cdevelops ideas cogently, organizes them logically, and connects them with clear transitions.xe2x80x9d A human grader simply tries to evaluate the essay based on descriptions in the scoring rubric. This technique, however, is subjective and can lead to inconsistent results. It is, therefore, desirable to provide an automatic scoring system that is accurate, reliable and yields consistent results.
Literature in the field of discourse analysis points out that lexical (word) and structural (syntactic) features of discourse can be identified (Mann, William C. and Sandra A. Thompson (1988): Rhetorical Structure Theory: Toward a functional theory of text organization, Text 8(3), 243-281) and represented in a machine, for computer-based analysis (Cohen, Robin: A computational theory of the function of clue words in argument understanding, in xe2x80x9cProceedings of 1984 International Computational Linguistics Conference.xe2x80x9d California, 251-255 (1984); Hovy, Eduard, Julia Lavid, Elisabeth Maier, Vibhu Nettal and Cecile Paris: Employing Knowledge Resources in a New Text Planner Architecture, in xe2x80x9cAspects of Automated NL Generation,xe2x80x9d Dale, Hony, Rosner and Stoch (Eds), Springer-Veriag Lecture Notes in Al no. 587, 57-72 (1992); Hirschberg, Julia and Diane Litman: Empirical Studies on the Disambiguation of Cue Phrases, in xe2x80x9cComputational Linguisticsxe2x80x9d (1993), 501-530 (1993); and Vander Linden, Keith and James H. Martin: Expressing Rhetorical Relations in Instructional, Text: A Case Study in Purpose Relation in xe2x80x9cComputational Linguisticsxe2x80x9d 21(1), 29-57 (1995)).
Previous work in automated essay scoring, such as by Page, E. B. and N. Petersen: The computer moves into essay grading: updating the ancient test. Phi Delta Kappa; March, 561-565 (1995), reports that predicting essay scores using surface feature variables, e.g., the fourth root of the length of an essay, shows correlations as high as 0.78 between a single human rater (grader) score and machine-based scores for a set of PRAXIS essays. Using grammar checker variables in addition to word counts based on essay length yields up to 99% agreement between machine-based scores that match human rater scores within 1 point on a 6-point holistic rubric. These results using grammar checker variables have added value since grammar checker variables may have substantive information about writing competency that might reflect rubric criteria such as, essay is free from errors in mechanics, usage and sentence structure.
A method of grading an essay using an automated essay scoring system is provided. The method comprises the steps of (a) parsing the essay to produce parsed text, wherein the parsed text is a syntactic representation of the essay, (b) using the parsed text and discourse-based heuristics to create a vector of syntactic features derived from the essay, (c) using the parsed text to create a vector of rhetorical features derived from the essay, (d) creating a first score feature derived from the essay, (e) creating a second score feature derived from the essay, and (f) processing the vector of syntactic features, the vector of rhetorical features, the first score feature, and the second score feature to generate a score for the essay.
In a preferred embodiment, the method further comprises the step of (g) creating a predictive feature set for the test question, where the predictive feature set represents a model feature set for the test question covering a complete range of scores of a scoring guide for the test question, wherein in step (f), a scoring formula may be derived from the predictive feature set and the score for the essay may be assigned based on the scoring guide. Preferably, a batch of original essays, which are essays of a known score to a test question, are used in accordance with the model feature of the invention to create the predictive feature set. Creating the predictive feature set in this manner comprises the steps of repeating steps (a) through (f) for the batch of original essays and processing the vector of syntactic features, the vector of rhetorical features, the first score feature, and the second score feature for each original essay using a linear regression to generate the predictive feature set for the test question.
Preferably, each essay is already in the form of electronic essay text as in the case with on-line essay testing. If this is not the case, however, then the method of the present invention further comprises the step of converting the essay into the form of electronic essay text.
A computer-based automated essay scoring system for grading an essay also is provided. The essay scoring system comprises a Syntactic Feature Analysis program which creates a vector of syntactic features of the electronic essay text, a Rhetorical Feature Analysis program which creates a vector of rhetorical features of the electronic essay text, an EssayContent program which creates a first Essay Score Feature, an ArgContent program which creates a second Essay Score Feature, and a score generator which generates a final score for the essay from the vector of syntactic features, the vector of rhetorical features, the first score feature, and the second score feature.
In a preferred embodiment, the essay scoring system further comprising a parser for producing a syntactic representation of each essay for use by the Syntactic Feature Analysis program and the Rhetorical Feature Analysis program. In another preferred embodiment, the essay scoring system further comprising a Stepwise Linear Regression program which generates a predictive feature set representing a model feature set that is predictive of a range of scores for the test question which is provided to the scoring engine for use in assessing the final score for the essay.