The present invention relates to a computerized, or other machine based test preparation system, and more particularly, to a method and apparatus for enhancing learning and improving examinee scores on standardized exams through the use of individually tailored diagnostics and remediation.
1. The Proliferation of xe2x80x9cHigh-stakesxe2x80x9d Examinations and Conventional Test Preparation Methods
xe2x80x9cHigh-stakesxe2x80x9d examinations are very common today. Typically they are time-based exams testing a set of predetermined subject areas. A number of these examinations, such as the Scholastic Aptitude Test (xe2x80x9cSATxe2x80x9d) series of examinations and others like it (e.g., MCAT, LSAT, ACT, GRE, GED, CLEP, BAR exam, DMV exams), have been labeled as xe2x80x9chigh-stakesxe2x80x9d testing. In such xe2x80x9chigh-stakesxe2x80x9d tests, the primary objective is the placement of an examinee on a latent trait or ability dimension, for a variety of purposes such as selection and placement (i.e., SAT, MCAT or LSAT), or certification (i.e., GED and DMV exams). Most of these tests include items from a variety of scholastic domains (e.g., SAT: verbal, mathematics; LSAT: logical reasoning, reading comprehension, verbal) that are arranged in a formal structure. The test items are chosen and developed by the test makers so as to xe2x80x9creliablyxe2x80x9d place examinees on the latent dimension of interest to the examiner and consumers of the standardized scores from such exams.
One factor which is thought to be capable of influencing examinee performance on these tests is coaching, or formal test preparation efforts. Because of the proliferation of these xe2x80x9chigh-stakesxe2x80x9d examinations, an entire test preparation industry has arisen to help prepare examinees and improve their scores on these exams. Offerings include classroom-based tutoring, stand-alone printed publications, and computer-based materials (e.g., disk, CD-ROM, internet). All of these offerings claim to be able to increase an examinee""s score on the particular standardized exams to which they are directed.
The most conventional test preparation offerings have been traditionally represented by such organizations as Kaplan Learning and The Princeton Review study centers, or self-study methods based on printed test preparation texts such as 10 Real SATS, and Gruber""s Complete Preparation for the New SAT: Eighth Edition. Through the use of such methods, examinee score increases have been modest, generally resulting in score increases on the order of ⅕th of a standard deviation.
More recently, computer based exam preparation materials have been developed and offered including The College Board""s One-on-One with the SAT, The Princeton Review""s Inside the SAT, ACT and PSAT, published by The Learning Company, and The Crash Course for SAT, PSAT and ACT, published by ARCO Publishing. Additionally, some of the testing centers mentioned above have begun offering computerized training materials generally corresponding to their traditional classroom based approach.
Common characteristics of these computerized offerings include: (i) presentation of timed xe2x80x9csample examsxe2x80x9d and practice exams, (ii) scoring of responses from these exams, (iii) some question-specific feedback (e.g., response chosen, correct answer, brief explanation), and (iv) general test-taking tips (e.g., pacing, skipping questions). Features which differentiate these offerings include: (i) the use or non-use of audio and/or graphics, (ii) the ability to mark items to be skipped and returned to, (iii) feedback of a study plan based upon the results of a xe2x80x9csample examxe2x80x9d, and (iv) the provision of explanations for each of the response alternatives for each item.
Several of these computerized offerings have been distributed over the internet. Some of the web sites offering exam preparation and review include: (i) Score.Kaplan.com (based on materials offered by Kaplan Learning), (ii) Review.com, (iii) Testprep.com, (iv) ACTive Prep at Act.org, (v) powerprep.com, and (vi) Novanet.com (based on materials offered by The Princeton Review). A review of these web sites as they existed in November, 2000 revealed variations in complexity from xe2x80x9cpage-turnersxe2x80x9d to relatively complete implementations of the printed volumes on which some of them are based. In general though, they reflect the same range of complexity and operation as found in the other computerized and CD-ROM offerings discussed.
Several of the web-based offerings also provide xe2x80x9csample examsxe2x80x9d which can be taken by the user. Information is generally fed back to the user of such offerings in the form of raw and scaled scores. In some cases, the feedback may also include a re-presentation of the exam items, the indication of the user""s response and the correct choice, and an explanation of why the correct answer is correct and why each of the alternatives are wrong. While responses to the xe2x80x9csample examsxe2x80x9d in some cases provide the basis for xe2x80x9cdiagnosticxe2x80x9d feedback, the diagnosis in this context is defined from a conventional testing perspective and is determined merely by the number of incorrect answers rather than the types of incorrect answers. Thus, a study plan, or diagnosis, if provided, is usually based upon the user""s distribution of scores across the various sections of the examination and results in a simplistic recommendation of remediation, such as the need to review geometric principles or increase vocabulary.
2. Recent Development in Cognitive Diagnostic Assessments
Educators and researchers, influenced by recent developments in cognitive psychology and societal concerns regarding the influence of testing on equality of education, have sought testing instruments that would reveal the mechanisms, structures and processes that are activated when an examinee takes a test, and thus, would inform the instructional process. Conventional tests, while adequately serving as selection and/or placement instruments, are not well suited for determining a course of instruction or for identifying the source of problem-solving errors.
A category of testing called cognitively diagnostic assessment (xe2x80x9cCDAxe2x80x9d) or dynamic testing has been developed which may provide a basis for individualized instruction for each examinee in a domain of interest. Such tests are based upon cognitive theories of learning, and as such, are not concerned with the representative sampling of items from a content domain (such as algebraic equations), but rather, with the examinee""s knowledge and application of cognitive attributes which are thought to be required or not required to adequately solve a given problem. CDA testing provides information regarding the strategies that examinees use to attack problems, relationships they perceive among concepts, and principles they understand in a domain. The goal of these testing methods is to determine, on the basis of a simple test, what the strengths and weaknesses of an examinee are, relative to a specified list of cognitive attributes of interest to the teacher and the tester.
CDA-type tests are typically built around an attribute by item matrix (i.e., a Q-matrix). Thus, for an examinee to solve a given problem, it is assumed that they have knowledge of, and the ability to apply, one or more cognitive attributes related to the item or problem. The failure of an examinee to solve a problem is then attributed to the absence of a requisite cognitive attribute or to a lack of skill in its application.
The major difficulty experienced with most CDA tests is one of numerosityxe2x80x94the number of possible sources of error grows exponentially as the number of attributes and the number of items increase. For example, some attempts by researchers to form a Q-matrix for 60 items on the SAT math test yielded more than 3,000 prototypical error patterns. Other researchers developed models containing only 4 attributesxe2x80x94strategy, completeness, positivity and slipsxe2x80x94which were proposed to be evocative of properties that could be uses in developing and interpreting diagnostic assessment tests. An evaluation of all of these models revealed that such a small universe of attributes could not adequately capture the test takers"" cognitive deficiencies, while large attribute approaches were unlikely to provide a practical means of cognitive diagnostic assessment based upon simple testing.
Improvements in testing have been made possible by advancements in computer technology as well as advancements in cognitive theory. However, because there are always more ways to get an item wrong than right on a multiple-choice exam (i.e., on a typical multiple-choice question, there is only one correct answer and 3 or 4 distractors or incorrect options), or even more so, with regard to open-response, xe2x80x9cfill-in-the-blankxe2x80x9d questions, the specification of the cognitive model space remains a difficult task. Currently available cognitive diagnostic assessment programs are not able to handle the complexity of SAT-type examination questions.
3. Scoring of Multiple Choice and Constructed Response Examination Items
Multiple choice (xe2x80x9cMCxe2x80x9d) tests are composed of items having two sub-partsxe2x80x94a stem representing the question and a series of response alternatives, one of which is the correct response. It is the presence of the response alternatives which differentiates MC test items from constructed response items which contain no response alternatives and require an examinee to self-generate a response. MC tests are typically scored by comparing the examinee""s response to an item against a key that contains the correct answers. This is dichotomous scoring, 0 or 1, the answer being either correct or incorrect. Polychotomous scoring methods assign weights to each of the response options, with the correct response being given the largest weight. In practice, the two scoring methods yield highly correlated sets of test scores. Polychotomous scoring methods utilize more of the information available in a set of incorrect answers or distractors, although solely in service of the conventional testing purpose of rank ordering examinees, rather than for instruction or remediation.
Under either method, the set of response alternatives (the incorrect alternatives being known as distractors or foils) assumes considerable importance. If the distractors don""t work, the test item becomes unreliable, and the interpretation of the scores becomes meaningless. Traditional methods of test construction have focused on the selection of distractors that are thought to yield some information about the latent trait being evaluated and on the elimination of non-working distractors. Conventionally, information derived from incorrect responses to an exam question is solely used by test developers to indicate that the question needs improvement, either in the wording of the stem or in the specification of the response alternatives. Nonetheless, additional useful information about the examinee can be captured from these incorrect answers. Researchers have observed that classification of response option choice according to type of error could be utilized for diagnostic purposes. Nevertheless, significant attention has not been directed toward developing MC tests in which the response alternatives are scored diagnostically for the benefit of the examinee and examiner.
Constructed response items, such as short answer or essay questions, typically require a person knowledgeable in the domain being tested to score such response items. Constrained constructed response items, such as the grid-in items on SAT-type examinations, may now be computer scored as the software programs are capable of accepting a range of responses as being correct. The scoring routines employed for a majority of high stakes examinations are still designed to yield scores based on a binary correct/incorrect coding of responses. Programs for the scoring of responses to extended essay questions are still in the investigatory stage.
4. Disadvantages of the Prior Art
Known methods of preparing examinees for xe2x80x9chigh-stakesxe2x80x9d exams are costly; fail to hold the interest of the examinee, and are inefficient and inconvenient. Furthermore, these current methods generally provide a low return on an examinee""s investment, both financially and mentally. Reviews of research on admissions test coaching indicated that score increases are on the order of ⅕th of a standard deviation.
More importantly, since the current methods of test preparation remain wedded to the traditional concept of ranking each examinee against another on a latent dimension using scaled scores, the failure of an examinee to achieve a xe2x80x9csatisfactory scorexe2x80x9d (as either defined by a school or other agency, or self-defined) results in a course of remediation limited to simple recommendations of more practice in a particular area; a method of remediation which is only weakly, if at all, informed by the test-taking experience. Feedback to the user which is based on such conventional test considerations does little to facilitate learning or improvement in knowledge in the domain of study.
Known test preparation systems also do not provide for the cognitive diagnosis of test-taking and/or content-related problems. Recommendations for remediation are based on the overall frequency of wrong answers in specific domains of a test, rather than on the frequency of specific types of wrong answers.
Furthermore, current systems do not permit the user to adapt the study program so that it is maximally effective for the particular user. Users who are visually oriented and learn most effectively from graphical presentations are provided no different manner of instruction than those users who are aurally oriented and who would benefit more from spoken explanations.
Finally, while mentoring, or one-on-one tutoring, remedy many of the shortcomings in these conventional test preparation methods, and while use of these methods is perhaps the most effective manner of diagnosing learning difficulties and effecting remedial action, individual mentoring is very costly and qualified mentors are limited in numbers and availability. Thus, once again, neither of these options are viable solutions for facilitating improvement in examinee scores on standardized tests in a commercially reasonable manner.
The invention provides an apparatus and method for enhancing learning and improving examinee test scores on standardized tests using cognitive diagnostic principles of diagnosis and remediation. More specifically, the invention provides a comprehensive, self-contained system for assessing and preliminarily diagnosing patterns of examinee errors through the use of data from the incorrect response alternatives (distractors) presented in each multiple-choice exam question or presented in response to constrained open-ended exam questions, confirming the preliminary diagnosis, if necessary, through the use of subsequent examination, offering remediation based upon the diagnosed error patterns, and reinforcing this remediation through skill development exercises, in order to increase an examinee""s learning and level of performance on standardized tests. According to one embodiment, the invention utilizes information inherent in the distractors in standardized multiple-choice tests. According to another embodiment, the invention utilizes information provided as responses to constrained open-ended exam questions, in which the stem of such questions mirrors those employed in standardized multiple-choice tests. A system incorporating either of these embodiments does not require the creation of new test questions or responses. According to another embodiment, specific distractors are included in evaluation examinations that provide additional insight and information in identifying problem solving deficiencies. According to another embodiment of the invention, coded categories of responses that correspond to user generated response items are employed to provide information analogous to that provided by the distractors in standardized multiple-choice tests.
A system designed according to one embodiment of the invention incorporates several different program components. Those components may include a user interface, a test generator, a diagnostic scoring component, and a remediation component. The user interface manages a user""s interaction with the system, requests and stores various personal information with respect to the user, and allows the system to be specifically tailored to the individual user. The test generation component compiles and formats various types of examinations for provision to the user, such as diagnostic sample tests, non-diagnostic test-taking strategy tests, and basic skill tests, and presents the examinations to the user for completion, storing a variety of information with regard to the user""s responses to the exam. The diagnostic component assesses and diagnoses (both preliminarily and through a more informed manner) a user""s error patterns in connection with the tests generated from the test generation component. The remediation component employs diagnoses from the diagnostic component to recommend remedial activities for improving examinee test performance and scores. The remediation component additionally contains a number of features that, in connection with the user interface component, allow the system to be specifically tailored to an individual user. Such features include the designation of materials for specific types of presentation, scrolling and bookmarking of materials, the presentation of difficulty meter levels, and the use of various multi-media features for presentation of remediation materials.
The systems and methods according to one aspect of the invention identify patterns of errors in a user""s choice of distractors contained within current standardized tests, and provide individually tailored remedial activities selected and based on such patterns. According to another aspect of the invention, test questions are developed which have stems and correct answers that are parallel to current standardized tests, but which have distractors that are designed to identify specific problem solving errors. According to yet another aspect of the invention, questions are developed independent of current standardized tests, and which have distractors designed to identify specific cognitive errors. Analysis of the selection of incorrect and correct answers is used to develop an individualized program of remediation.
According to an embodiment of the invention wherein current published standardized tests are used to compile the exam questions by the test generation component, the content and format of a particular test determines the overall number of distractor codes that are assigned. Since many standardized tests have been developed using a variety of item analysis techniques, the items and associated distractors which comprise the final versions of these tests are considered to be effective at assessing the examinee""s knowledge of the content domain. Thus, a system according to the present invention needs only to determine the information value of the incorrect responses and to assign category codes that reflect the probable error type made by an examinee that chose the incorrect alternative for tests of this type. According to another embodiment wherein exam items are generated which are specifically tailored to assess a user""s response to specified distractor codes, a more detailed range of codes can be assigned.
Known test preparation systems do not provide for the diagnosis of error patterns that exist in an user""s choice of incorrect response alternatives, and thus do not have the capability of recommending a course of remediation and/or skill development on the basis of a user""s having responded to a sample standardized test in which the response options were not only scored as correct or incorrect, but also in terms of the types of errors they represent.
In light of the limitations of known tests preparation systems and methods, it is an object of the invention to provide a more efficient, convenient, and effective manner of enhancing learning and improving test scores for a variety of xe2x80x9chigh stakesxe2x80x9d examinations. It is another object of the present invention to provide a test preparation system and method wherein an examinee""s error patterns with respect to incorrect responses to exam questions are assessed and the examinee""s cognitive deficiencies diagnosed, and, using this information, recommendations are made for remedial activities targeted to the individual examinee.
Another object of the present invention is to provide a system and method for teaching xe2x80x9ctestwisenessxe2x80x9d skills, or skills which incorporate the use of cues provided by the test itself, or which are obtained by knowledge of the propensities of the test maker, to arrive at correct answer to exam questions without possessing an underlying knowledge of how and why a particular answer is correct.
The various aspects of the invention discussed above may also be combined in various ways to produce additional advantages of the invention over known systems and methods. For example, the present invention may be used to provide for remedial training and/or skill development informed by the assessment of an individual examinee""s error patterns, provided at a customer""s site. The present invention may also provide for a flexible presentation of test contents and materials, in both visual and audio form, tailored to the unique needs of a particular examinee as chosen by the examinee himself. In addition, further objects and advantages afforded by the present invention will be apparent from the detailed description hereinbelow.