Item Response Theory (IRT) is a body of theory used in the field of psychometrics. In IRT, mathematical models are applied to analyze data from tests or questionnaires in order to measure abilities and attitudes studied in psychometrics. One branch of IRT is diagnostic models. Diagnostic models may be used to provide skill profiles, thereby offering additional information about the examinees. One central tenet behind diagnostic models is that different items tap into different sets of skills or examinee attributes and that experts can generate a matrix of relations between items and skills required to solve these items. A diagnostic model according to the prior art, the General Diagnostic Model (GDM), will hereinafter be described.
In order to define the GDM, several assumptions must first be presented. Assume an I-dimensional categorical random variable {right arrow over (x)}=(x1, . . . , x1) with xiε{0, . . . , mi) for iε{1, . . . , I}, which may be referred to as a response vector. Further assume that there are N independent and identically distributed (i.i.d.) realizations {right arrow over (x)}1, . . . , {right arrow over (x)}N of this random variable {right arrow over (x)}, so that xni denotes the i-th component of the n-th realization {right arrow over (x)}n. In addition, assume that there are N unobserved realizations of a K-dimensional categorical variable, {right arrow over (a)}=(a1, . . . , ak), so that the vector({right arrow over (x)}n,{right arrow over (a)}n)=(xn1, . . . , xn1, an1, . . . , anK)exists for all nε{1, . . . , N} The data structure(X,A)=(({right arrow over (x)}n,{right arrow over (a)}n))n=1, . . . , N may be referred to as the complete data, and ({right arrow over (x)}n)=(({right arrow over (x)}n,{right arrow over (a)}n))n=1, . . . , N is referred to as the observed data matrix. Denote ({right arrow over (a)}n)n=1, . . . , N as the latent skill or attribute patterns, which is the unobserved target of inference.
Let P({right arrow over (a)})=P({right arrow over (A)}=(a1, . . . , aK))>0 for all {right arrow over (a)} denote the nonvanishing discrete count density of {right arrow over (a)}. Assume that the conditional discrete count density P(x1, . . . , x1|{right arrow over (a)}) exists for all {right arrow over (a)}. Then the probability of a response vector {right arrow over (x)} can be written as
      P    ⁡          (              x        ⇀            )        =            ∑              a        ⇀              ⁢                  ⁢                  P        ⁡                  (                      a            ⇀                    )                    ⁢              P        ⁡                  (                                    x                              1                ⁢                                                                                        ,            …            ⁢                                                  ,                                          x                I                            ❘                              a                ⇀                                              )                    
Thus far, no assumptions have been made about the specific form of the conditional distribution of {right arrow over (x)} given {right arrow over (a)}, other than that P(x1, . . . , x1|{right arrow over (a)}) exists. For the GDM, local independence (LI) of the components {right arrow over (x)} given {right arrow over (a)} may be assumed, which yields
      P    ⁡          (                        x                      1            ⁢                                                                ,        …        ⁢                                  ,                              x            I                    ❘                      a            ⇀                              )        =            ∑              i        =        1            I        ⁢                  ⁢                  p        i            ⁡              (                  x          =                                    x              i                        ❘                          a              ⇀                                      )            so that the probability pi (x=xi|{right arrow over (a)}) is the one component left to be specified to arrive at a model for P({right arrow over (x)}).
Logistic models have secured a prominent position among models for categorical data. The GDM may also be specified as a model with a logistic link between an argument, which depends on the random variables involved and some real valued parameters, and the probability of the observed response.
Using the above definitions, the GDM may be defined as follows. LetQ=(qik), i=1, . . . , I, k=1, . . . , Kbe a binary I×K matrix, that is qikε{0,1}. Let(γikx), i=1, . . . , I, k=1, . . . , K, x=1, . . . , mi be a cube of real valued parameters, and let βix for i=1, . . . , I and xε{0, . . . , mi) be real valued parameters. Then define
            p      i        ⁡          (              x        ❘                  a          ⇀                    )        =            exp      ⁡              (                              β            ix                    +                                    ∑              k                        ⁢                                          γ                ikx                            ⁢                              h                ⁡                                  (                                                            q                      ik                                        ,                                          a                      k                                                        )                                                                    )                    1      +                        ∑                      y            =            1                                m            i                          ⁢                                  ⁢                  exp          ⁡                      (                                          β                iy                            +                                                ∑                  k                                ⁢                                                      γ                    iky                                    ⁢                                      h                    ⁡                                          (                                                                        q                          ik                                                ,                                                  a                          k                                                                    )                                                                                            )                              
It may convenient to constrain the γikx somewhat an to specify real valued function h(qik, ak) and the ak in a way that allows emulation of models frequently used in educational measurements and psychometrics. It may be convenient to choose h(qik, ak)=qikak, and γikx=xγik.
The GDM has some unfortunate limitations. Primarily, it is not equipped to handle unobserved partitions, or subpopulations, in the examinees.
Thus, there is a need for a diagnostic model that may be extended to handle unobserved subpopulations.