1. Field of the Invention
The present invention relates generally to artificial intelligence software. In particular, taught herein is an automated grading program for student assessment which grades homework and test problems in which students show their own work in detail, statistically performing the grading job as well as or better than a human teacher in realistic situations. The program was validated by directly comparing its grading against that of actual teachers on a database of authentic student work. The artificial intelligence (AI) program achieved excellent inter-rater agreement with the teachers while eliminating inconsistencies in grading due to human error.
2. Description of the Related Art
The importance of sophisticated high-quality tools for assessment, for example in chemistry, is on a par with interactive tutoring.
The majority of current assessment tools are based on multiple-choice (MC) tests or similar, very basic techniques. For example, in older systems described in U.S. Pub. 2003/0180703 a system and method is provided for educational assessment of at least one student wherein, using a computer network, the method includes providing a test for subject matter and dynamically generating an answer sheet for the test. A completed answer sheet is scanned with an image scanner. Answers are graded on the scanned image of the answer sheet and results are automatically stored from the grading of the answer sheet in a central repository at the central server for at least one student. In US 2005/0255439 shown is a method and system for generating and processing an assessment examination answer sheet. The answer sheet is formatted by question, column or section using a layout tool and includes calibration marks, and examinee and examination identifiers. The format is stored in a computer-readable form, such as an XML file. Upon completion, a scanned image of the answer sheet is produced using a conventional scanner.
While this is sufficient for simple purposes such as easy grading, MC is a relatively blunt instrument for diagnosis and assessment and is inadequate as a foundation for developing the more sophisticated individualized assessment capabilities needed by teachers. Grading by a human teacher when students show all their work in detail provides more sophisticated and meaningful assessments of learning than MC tests.
The problem is that in practice teachers seldom, if ever, have time to perform such in-depth analysis for each and every student. At the same time, recent trends are toward increasing requirements for teachers to compile, evaluate and provide more detailed reports of student achievement, increasing the demand on the already-overburdened teacher's time. The potential benefits of robust, dependable AI-based software assessment tools are considerable for helping teachers increase their effectiveness and achieve the greatest return on their time and resources.
Intelligent tutoring systems (ITSs) are currently being developed. A major advantage of these systems and also relevant to this work is that they can create a worked-out solution with detailed explanations for any problem entered by the student or teacher from any type of source, whether it be a textbook, a software program (including the ITS itself), or any randomly entered external problem. See for example U.S. Pat. Nos. 6,540,520 and 7,351,064. Unlike a conventional tutorial, this is done dynamically, without the problem being stored ahead of time. In controlled testing, the above ITS was shown to improve student performance significantly, which was encouraging for the prospect of building assessment technology on the same foundation.
A major drawback to “step-oriented” computer assessment and a major challenge solved by the present invention is the reliable, consistent assignment of partial credit for student attempts at solving problems having a large number and variety of non-equivalent multi-step solution paths. This is often the case, for example, in scientific and mathematical problem domains. When attempting to solve such problems, a student may follow any one of the possible solution paths, making (and possibly then also propagating) any number of errors along the way, or the student's attempted solution path may not correspond to any legitimate path. The solution attempt may also be complete or incomplete. Furthermore, the method must work with problems that are input to the system dynamically by an external party, such as from an instructor's assignment or an electronic homework system, instead of from a supplied rubric for a fixed problem or set of preprogrammed problems. This makes consistent and fair assignment of partial credit by the AI system across all student attempts to solve a problem very difficult.