Standardized testing is prevalent in the United States today. Such testing is often used for higher education entrance examinations and achievement testing at the primary and secondary school levels. The prevalence of standardized testing in the United States has been further bolstered by the No Child Left Behind Act of 2001, which emphasizes nationwide test-based assessment of student achievement.
The typical focus of research in the field of assessment measurement and evaluation has been on methods of item response theory (IRT). A goal of IRT is to optimally order examinees along a low dimensional plane (typically unidimensional) based on the examinee's responses and the characteristics of the test items. The ordering of examinees is done via a set of latent variables presupposed to measure ability. The item responses are generally considered to be conditionally independent of each other.
The typical IRT application uses a test to estimate an examinee's set of abilities (such as verbal ability or mathematical ability) on a continuous scale. An examinee receives a scaled score (a latent trait scaled to some easily understood metric) and/or a percentile rank. The final score (an ordering of examinees along a latent dimension) is used as the standardized measure of competency for an area-specific ability.
Although achieving a partial ordering of examinees remains an important goal in some settings of educational measurement, the practicality of such methods is questionable in common testing applications. For each examinee, the process of acquiring the knowledge that each test purports to measure seems unlikely to occur via this same low dimensional approach of broadly defined general abilities. This is, at least in part, because such testing can only assess a student's abilities generally, but cannot adequately determine whether a student has mastered a particular ability or not.
Because of this limitation, cognitive modeling methods, also known as skills assessment or skills profiling, have been developed for assessing students' abilities. Cognitive diagnosis statistically analyzes the process of evaluating each examinee on the basis of the level of competence on an array of skills and using this evaluation to make relatively fine-grained categorical teaching and learning decisions about each examinee. Traditional educational testing, such as the use of an SAT score to determine overall ability, performs summative assessment. In contrast, cognitive diagnosis performs formative assessment, which partitions answers for an assessment examination into fine-grained (often discrete or dichotomous) cognitive skills or abilities in order to evaluate an examinee with respect to his level of competence for each skill or ability. For example, if a designer of an algebra test is interested in evaluating a standard set of algebra attributes, such as factoring, laws of exponents, quadratic equations and the like, cognitive diagnosis attempts to evaluate each examinee with respect to each such attribute. In contrast, summative analysis simply evaluates each examinee with respect to an overall score on the algebra test.
Numerous cognitive diagnosis models have been developed to attempt to estimate examinee attributes. In cognitive diagnosis models, the atomic components of ability, the specific, finely grained skills (e.g., the ability to multiply fractions, factor polynomials, etc.) that together comprise the latent space of general ability, are referred to as attributes. Due to the high level of specificity in defining attributes, an examinee in a dichotomous model is regarded as either a master or non-master of each attribute. The space of all attributes relevant to an examination is represented by the set {a1, . . . , ak}. Given a test with items i=1, . . . , I, the attributes necessary for each item can be represented in a matrix of size I×K. This matrix is referred to as a Q-matrix having values Q={qik}, where qik=1 when attribute k is required by item i and qik=0 when attribute k is not required by item i. Typically, the Q-matrix is constructed by experts and is pre-specified at the time of the examination analysis.
Cognitive diagnosis models can be sub-divided into two classifications: compensatory models and conjunctive models. Compensatory models allow for examinees who are non-masters of one or more attributes to compensate by being masters of other attributes. An exemplary compensatory model is the common factor model. High scores on some factors can compensate for low scores on other factors.
Numerous compensatory cognitive diagnosis models have been proposed including: (1) the Linear Logistic Test Model (LLTM) which models cognitive facets of each item, but does not provide information regarding the attribute mastery of each examinee; (2) the Multicomponent Latent Trait Model (MLTM) which determines the attribute features for each examinee, but does not provide information regarding items; (3) the Multiple Strategy MLTM which can be used to estimate examinee performance for items having multiple solution strategies; and (4) the General Latent Trait Model (GLTM) which estimates characteristics of the attribute space with respect to examinees and item difficulty.
Conjunctive models, on the other hand, do not allow for compensation when critical attributes are not mastered. Such models more naturally apply to cognitive diagnosis due to the cognitive structure defined in the Q-matrix and will be considered herein. Such conjunctive cognitive diagnosis models include: (1) the DINA (deterministic inputs, noisy “AND” gate) model which requires the mastery of all attributes by the examinee for a given examination item; (2) the NIDA (noisy inputs, deterministic “AND” gate) model which decreases the probability of answering an item for each attribute that is not mastered; (3) the Disjunctive Multiple Classification Latent Class Model (DMCLCM) which models the application of non-mastered attributes to incorrectly answered items; (4) the Partially Ordered Subset Models (POSET) which include a component relating the set of Q-matrix defined attributes to the items by a response model and a component relating the Q-matrix defined attributes to a partially ordered set of knowledge states; and (5) the Unified Model which combines the Q-matrix with terms intended to capture the influence of incorrectly specified Q-matrix entries.
The Unified Model specifies the probability of correctly answering an item Xij for a given examinee j, item i, and set of attributes k=1, . . . , K as:
            P      ⁡              (                                            X              ij                        =                          1              |                              α                j                                              ,                      θ            j                          )              =                  (                  1          -          p                )            ⁡              [                                            d              j                        ⁢                                          ∏                                  k                  =                  1                                K                            ⁢                                                π                  ik                                                            α                      jk                                        ⁢                                          xq                      ik                                                                      ⁢                                  r                  ik                                      (                                          1                      -                                                                        α                          jk                                                ⁢                                                  xq                          ik                                                                                      )                                                  ⁢                                                      P                    i                                    ⁡                                      (                                                                  θ                        j                                            +                                              Δ                        ⁢                                                                                                  ⁢                                                  c                          i                                                                                      )                                                                                +                                    (                              1                -                                  d                  i                                            )                        ⁢                                          P                i                            ⁡                              (                                  θ                  j                                )                                                    ]              ,where                θj is the latent trait of examinee j; p is the probability of an erroneous response by an examinee that is a master; di is the probability of selecting the pre-defined Q-matrix strategy for item i;        πik is the probability of correctly applying attribute k to item i given mastery of attribute k; rik is the probability of correctly applying attribute k to item i given non-mastery of attribute k; ajk is an examinee attribute mastery level, and ci is a value indicating the extent to which the Q-matrix entry for item i spans the latent attribute space.        
One problem with the Unified Model is that the number of parameters per item is unidentifiable. The Reparameterized Unified Model (RUM) attempted to reparameterize the Unified Model in a manner consistent with the original interpretation of the model parameters. For a given examinee j, item i, and Q-matrix defined set of attributes k=1, . . . , K, the RUM specifies the probability of correctly answering item Xij as:
            P      ⁡              (                                            X              ij                        |                          α              j                                ,                      θ            j                          )              =                  π        i        *            ⁢                        ∏                      k            =            1                    K                ⁢                              r            ik                          *                              (                                  1                  -                                      α                    jk                                                  )                            ⁢                              xq                ik                                              ⁢                                    P                              c                i                                      ⁡                          (                              θ                j                            )                                            ,where
      π    i    *    =            ∏              k        =        1            K        ⁢          π      ik              q        ik            (the probability of correctly applying all K Q-matrix specified attributes for item i),
      r    ik    *    =            r      ik              π      ik      (the penalty imposed for not mastering attribute k), and
            P              c        i              ⁡          (              θ        j            )        =            ⅇ              (                              θ            j                    +                      c            i                          )                    1      +              ⅇ                  (                                    θ              j                        +                          c              i                                )                    (a measure of the completeness of the model).
The RUM is a compromise of the Unified Model parameters that allow the estimation of both latent examinee attribute patterns and test item parameters.
Another cognitive diagnosis model derived from the Unified Model is the Fusion Model. In the Fusion Model, the examinee parameters are defined aj, a K-element vector representing examinee j's mastery/non-mastery status on each of the attributes specified in the Q matrix. For example, if a test measures five skill attributes, an examinee's aj vector might be ‘11010’, implying mastery of skill attributes 1, 2 and 4, and non-mastery of attributes 3 and 5. The examinee variable θj is normalized as in traditional IRT applications (mean of 0, variance of 1). The probability that examinee j answers item i correctly is expressed as:
      P    ⁡          (                                    X            ij                    |                                    α              _                        j                          ,                  θ          j                    )        =            π      i      *        ⁢                  ∏                  k          =          1                K            ⁢                        r          ik                      *                          (                              1                -                                  α                  jk                                            )                        ⁢                          xq              ik                                      ⁢                              P                          c              i                                ⁡                      (                          θ              j                        )                              where                π*i is the probability of correctly applying all K Q-matrix specified attributes for item i, given that an examinee is a master of all of the attributes required for the item,        r*ik is the ratio of (1) the probability of successfully applying attribute k on item i given that an examinee is a non-master of attribute k and (2) the probability of successfully applying attribute k on item i given that an examinee is a master of attribute k, and        
            P              c        i              ⁡          (              θ        j            )        =      1          1      +              ⅇ                  -                      (                                          θ                j                            +                              c                i                                      )                              is the Rasch Model with easiness parameter ci (0≦ci≦3) for item i.
Based on this equation, it is common to distinguish two components of the Fusion Model: (1) the diagnostic component:
            π      i      *        ⁢                  ∏                  k          =          1                K            ⁢              r        ik                  *                      (                          1              -                              α                jk                                      )                    ⁢                      xq            ik                                ,which is concerned with the influence of the skill attributes on item performance, and (2) the residual component: Pci(θj), which is concerned with the influence of the residual ability. These components interact conjunctively in determining the probability of a correct response. That is, successful execution of both the diagnostic and residual components of the model is needed to achieve a correct response on the item.
The r*ik parameter assumes values between 0 and 1 and functions as a discrimination parameter in describing the power of the ith item in distinguishing masters from non-masters on the kth attribute. The r*ik parameter functions as a penalty by imposing a proportional reduction in the probability of correct response (for the diagnostic part of the model) for a non-master of the attribute, assuming the attribute is needed to solve the item. The ci parameters are completeness indices, indicating the degree to which the attributes specified in the Q-matrix are “complete” in describing the skills needed to successfully execute the item. Values of ci close to 3 represent items with high levels of completeness; values close to 0 represent items with low completeness.
The item parameters in the Fusion model have a prior distribution that is a Beta distribution, β(a, b), where (a, b) are defined for each set of item parameters, π*, r*, and c/3. Each set of hyperparameters is then estimated within the MCMC chain to determine the shape of the prior distribution.
One difference between the RUM and Fusion Model is that the ajk term is replaced in the Fusion Model with a binary indicator function, I(ājk>κk), where ājk is the underlying continuous variable of examinee j for attribute k (i.e., an examinee attribute value), and κk is the mastery threshold value that ājk must exceed for ajk=1.
MCMC algorithms estimate the set of item (b) and latent examinee (θ) parameters by using a stationary Markov chain, (A0, A1, A2, . . . ), with At=(bt, θt). The individual steps of the chain are determined according to the transition kernel, which is the probability of a transition from state t to state t+1, P[(bt+1, θt+1)|(bt, θt)]. The goal of the MCMC algorithm is to use a transition kernel that will allow sampling from the posterior distribution of interest. The process of sampling from the posterior distribution can be evaluated by sampling from the distribution of each of the different types of parameters separately. Furthermore, each of the individual elements of the vector can be sampled separately. Accordingly, the posterior distribution to be sampled for the item parameters is P(bi|X, θ) (across all i) and the posterior distribution to be sampled for the examinee parameters is P(θj|X, b) (across all j).
One problem with MCMC algorithms is that the choice of a proposal distribution is critical to the number of iterations required for convergence of the Markov Chain. A critical measure of effectiveness of the choice of proposal distribution is the proportion of proposals that are accepted within the chain. If the proportion is low, then many unreasonable values are proposed, and the chain moves very slowly towards convergence. Likewise, if the proportion is very high, the values proposed are too close to the values of the current state, and the chain will converge very slowly.
While MCMC algorithms suffer from the same pitfalls of JML optimization algorithms, such as no guarantee of consistent parameter estimates, a potential strength of the MCMC approaches is the reporting of examinee (binary) attribute estimates as posterior probabilities. Thus, MCMC algorithms can provide a more practical way of investigating cognitive diagnosis models.
Different methods of sampling values from the complete conditional distributions of the parameters of the model include the Gibbs sampling algorithm and the Metropolis-Hastings within Gibbs (MHG) algorithm. Each of the cognitive diagnosis models fit with MCMC used the MHG algorithm to evaluate the set of examinee variables because the Gibbs sampling algorithm requires the computation of a normalizing constant. A disadvantage of the MHG algorithm is that the set of examinee parameters are considered within a single block (i.e., only one parameter is variable while other variables are fixed). While the use of blocking speeds up the convergence of the MCMC chain, efficiency may be reduced. For example, attributes with large influences on the likelihood may overshadow values of individual attributes that are not as large.
One problem with current cognitive diagnosis models is that they do not adequately evaluate examinees on more than two skill levels, such as master and non-master. While some cognitive diagnosis models do attempt to evaluate examinees on three or more skill levels, the number of variables used by such models is excessive.
Accordingly, what is needed is a method for performing cognitive diagnosis using a model that evaluates examinees on individual skills using polytomous attribute skill levels.
A further need exists for a method that considers each attribute separately when assessing examinees.
A still further need exists for a method of classifying examinees using a reduced variable set for polytomous attribute skill levels.
The present disclosure is directed to solving one or more of the above-listed problems.