The traditional outcome of an educational test is a set of test scores reflecting the numbers of correct and incorrect responses provided by each student. While such scores may provide reliable and stable information about students' standing relative to a group, they fall short of indicating the specific patterns of skill mastery underlying students' observed item responses. Such additional information may help students and teachers better understand the meaning of test scores and the kinds of learning which might help to improve those scores.
Procedures for translating observed test results into instructionally-relevant Statements about students' underlying patterns of skill mastery may be designed to provide student-level diagnostic information or group-level diagnostic information. Student-level diagnoses characterize the individual strengths and weaknesses of individual students. Group-level diagnoses characterize the strengths and weaknesses expected for students scoring at specified points on a test's reported score scale. A collection of group-level diagnoses designed to span a test's reported score range is termed a proficiency scale.
Both group- and student-level diagnoses can provide useful feedback. The detailed information available from a student-level diagnosis can help human or computerized tutors design highly individualized instructional intervention. The cross-sectional view provided by a set of group-level diagnoses can be used to: (a) demonstrate that the skills tapped by a particular measurement instrument are in fact those deemed important to measure, and (b) suggest likely areas of improvement for individual students. Both types of diagnoses can also be used to inform course placement decisions.
Procedures for generating group-level and/or student-level diagnoses have been proposed by a number of researchers. Beaton and Allen proposed a procedure called Scale Anchoring which involved (a) identifying subsets of test items which provided superior discrimination at successive points on a test's reported score scale; and (b) asking subject-area experts to review the items and provide detailed descriptions of the specific cognitive skills that groups of students at or close to the selected score points would be expected to have mastered. (Beaton, A. E. & N. L. Allen, Interpreting scales through scale anchoring, Journal of Educational Statistics, vol. 17, pp. 191-204, 1992.) This procedure provides a small number of group-level diagnoses, but no student-level diagnoses. The estimated group-level diagnoses are specified in terms of the combinations of skills needed to solve items located at increasingly higher levels on a test's reported score scale.
Tatsuoka, Birenbaum, Lewis, and Sheehan outlined an approach which provides both student- and group-level diagnoses. (Tatsuoka, K.K., Architecture of knowledge structures and cognitive diagnosis, P. Nichols, S. Chipman & R. Brennan, Eds., Cognitively diagnostic assessment. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1995. Tatsuoka, K., M. Birenbaum, C. Lewis, & K. Sheehan, Proficiency scaling based on conditional probability functions for attributes, ETS Research Report No. RR-93-50-ONR, Princeton, N.J.: Educational Testing Service, 1993.) Student-level diagnoses are generated by first hypothesizing a large number of latent skill mastery states and then using a Mahalanobis distance test (i.e. the Rule Space procedure) to classify as many examinees as possible into one or another of the hypothesized states. The classified examinees' hypothesized skill mastery patterns (i.e. master/nonmaster status on each of k skills) are then summarized to provide group-level descriptions of the skill mastery status expected for students scoring at successive points on a test's reported score scale. For example, in an analysis of 180 mathematics items selected from the Scholastic Assessment Test (SAT 1), 94% of 6,000 examinees were classified into one of 2,850 hypothesized skill mastery states (Tatsuoka, 1995, pg 348).
Gitomer and Yamamoto generate student-level diagnoses using the Hybrid Model. (Gitomer, D. H. & K. Yamamoto, Performance modeling that integrates latent trait and latent class theory, Journal of Educational Measurement, vol. 28, pp. 173-189, 1991.) In this approach, likelihood-based inference techniques are used to classify as many examinees as possible into a small number of hypothesized skill mastery states. For example, in an analysis of 288 logic gate items, 30% of 255 examinees were classified into one of five hypothesized skill mastery states (Gitomer & Yamamoto at 183). For each of the remaining examinees, Gitomer et al. provided an Item Response Theory (IRT) ability estimate which indicated standing relative to other examinees but provided no additional information about skill mastery.
Mislevy, Gitomer, and Steinberg generate student-level diagnoses using a Bayesian inference network. (Mislevy, R. J., Probability-based inference in cognitive diagnosis, P. Nichols, S. Chipman, & R. Brennan, Eds., Cognitively diagnostic assessment, Hillsdale, N.J.: Lawrence Erlbaum Associates, 1995. Gitomer, D. H., L. S. Steinberg, & R. J. Mislevy, Diagnostic assessment of troubleshooting skill in an intelligent tutoring system, P. Nichols, S. Chipman, & R. Brennan, Eds., Cognitively diagnostic assessment, Hillsdale, N.J.: Lawrence Erlbaum Associates, 1995.) This approach differs from the approaches described previously in two important respects: (1) students' observed item responses are modeled conditional on a multivariate vector of latent student-level proficiencies, and (2) multiple sources of information are considered when diagnosing mastery status on each of the hypothesized proficiencies. For example, in an analysis of fifteen fraction subtraction problems, nine student-level variables were hypothesized and information about individual skill mastery probabilities was derived from two sources: population-level skill mastery base rates and examinees' observed item response vectors (Mislevy, 1995).
In each of the diagnostic approaches described above, it is assumed that the test under consideration is a broad-based proficiency test such as those that are typically used in educational settings. Lewis and Sheehan consider the problem of generating student-level diagnoses when the item response data is collected via a mastery test, that is, a test designed to provide accurate measurement at a single underlying proficiency level, such as a pass/fail point. (Lewis, C. & K. M. Sheehan, Using Bayesian decision theory to design a computerized mastery test, Applied Psychological Measurement, vol. 14, pp. 367-386, 1990. Sheehan, K. M. & C. Lewis, Computerized mastery testing with nonequivalent testlets, Applied Psychological Measurement, vol. 16, pp. 65-76, 1992.) In this approach, decisions regarding the mastery status of individual students are obtained by first specifying a loss function and then using Bayesian decision theory to define a decision rule that minimizes posterior expected loss.
The prior art methods are known to be computationally intensive and not to consider any observed data. Moreover, these approaches are form dependent. That is, the set of knowledge states obtained excludes all states that might have been observed with a different form, but could not have been observed with the current form. Finally, the prior art methods cannot capture states involving significant interaction effects if those effects are not specified in advance.
Thus there is a need in the art for a less computationally intensive method designed to search for, and incorporate, all significant skill-mastery patterns that can be determined from the available item difficulty data. There is a further need in the art for a form independent approach that provides all of the knowledge states which could have been observed, given the collection of forms considered in the analysis. There is a further need in the art for an approach that automatically incorporates all identified interaction states so that the success of the procedure is not critically dependent on detailed prior knowledge of the precise nature of the true underlying proficiency model.