Testing of individuals is carried out for a wide variety of purposes. One of these purposes is achieved through mastery testing. Mastery testing is used in educational and certification contexts to decide, on the basis of test performance, whether or not an individual has attained a specified level of knowledge, or mastery, of a given subject. 2. Description of the Prior Art
A central problem in designing a mastery test is that of maximizing the probability of making a correct mastery decision while simultaneously minimizing test length. A similar problem is frequently encountered in the field of quality control: acceptance sampling plans must be designed to maximize the probability of correctly classifying the quality of a lot of manufactured material while simultaneously minimizing the number of items inspected. The solution to the acceptance sampling problem which was proposed by Wald in 1947, called the sequential probability ratio test, exploited the fact that a lot of very poor quality can be expected to reveal its character in a very small sample whereas lots of medium quality will always require more extensive testing. This may be done by testing one randomly selected unit at a time while allowing for the possibility of a decision on the quality of the lot as a whole after each drawing.
In an early application of the sequential testing approach to the mastery testing problem, disclosed in Ferguson (1969a, b) (full citations for this and other references are given in the reference section below), a sequential mastery test was designed which treated examinees' responses to items as a sequence of independent Bernoulli trials. This design requires a pool of calibrated items which can be sampled randomly. The test is conducted by presenting items to examinees, one at a time. After each item has been presented, a decision is made either to classify the examinee (as a master or a nonmaster) or to present another item. Ferguson also specified a maximum test length for those individuals for whom the mastery classification is very difficult to make. A major advantage of this approach is that it allows for shorter tests for individuals who have clearly mastered (or clearly not mastered) the subject matter, and longer tests for those for whom the mastery decision is not as clear-cut. A limitation is that, in order to maintain efficiency, the item pool must be restricted to equivalent items.
An alternative mastery testing procedure has been proposed by Lord (1980) and implemented at Educational Testing Service by Martha L. Stocking. In this alternative approach, all examinees receive the same fixed-length test, but the test is designed, constructed and scored using methods derived from Item Response Theory ("IRT"). An optimal test length is determined by specifying a maximum value for the length of the asymptotic confidence interval for estimating ability from test score in the region of the cutscore. This approach places no restrictions on the variability of items in the pool but it does require that all examinees take the same fixed-length test.
Wainer (1983) discloses IRT and computerized adaptive testing. IRT is a family of related models, differing somewhat in their assumptions, that express the relationship between an individual's ability (or skill or knowledge) and the probability that the individual will be able to answer a given test question or item correctly. Some models differ in their assumptions regarding the number of characteristics that are to be used to describe a test question. One popular IRT model, the Rasch model, assumes that items differ in a single characteristic, their difficulty. Another, the three-parameter model, which is used in a preferred embodiment of the present invention, considers variation in three characteristics--difficulty, the sharpness with which the question differentiates between high and low ability examinees, and the probability that an examinee with very low ability can answer the question correctly (roughly, the probability of guessing the answer).
IRT describes measurement characteristics of test items, including difficulty and accuracy of measurement, in a way that is independent of the particular sample tested and of the other questions administered. It is this independence that allows creation of a test in which different individuals receive different questions, yet can be scored on a common scale. Similarly, this independence permits determination in advance of test administration of the level of ability and the accuracy with which ability has been measured, represented by the individual's performance on a set of test questions. Knowing the IRT characteristics of the questions in a test and which of these an individual answered correctly, a score representing the individual's ability may be derived in a nonarbitrary manner.
The present invention is a new method of mastery testing which uses IRT and a sequential testing approach to provide a test with an adaptive stopping rule. While several others have also proposed IRT-based sequential testing procedures including Kingsbury and Weiss (1983), Reckase (1983) and Frick (1986) they did not disclose or suggest a computerized mastery testing system or important features thereof. At least one, Reckase (1983), seemed to suggest that a item-at-a-time sequential testing procedure based on an IRT model was so involved as to make any testing procedure based on it, for practical purposes, impossible.
The present invention differs from those presented previously in that, inter alia, the sequential testing process operates on ordered collections of items, called testlets, rather than individual items, the relationship between observed test performance and true mastery status is modeled using IRT, and the decision rule is determined using Bayesian decision theory. The inventors call the present invention a Computerized Mastery Testing ("CMT") system, because it is designed to be administered and scored using personal computers or the equivalent.
The CMT system also solves some problems generally associated with mastery testing.
First, it provides greater control over item order and context effects, that is, effects which arise when the administration of a particular item has an effect on the difficulty of a subsequent item.
Second, it can be adapted for use in a mastery testing program requiring diagnostic feedback, by explicitly including the losses associated with alternative feedback strategies in the loss function specification.
Third, test security is an important consideration in any mastery testing program. The CMT system described herein enhances test security since not all candidates will receive the same combination of testlets (depending on the size of the testlet pool) and candidates who have clearly not mastered the subject matter of the test will receive shorter tests, and thus will be less likely to receive the same subset of testlets should they decide to retake the test at a later date.
Fourth, the record keeping burden associated with administering tests to repeat examinees is reduced because the system must only keep track of the identifications associate with the subset of testlets, for example, five or six testlets administered to each examinee, as opposed to the identifications of, possibly, hundreds of individual test items.