Item Response Theory (IRT) is a body of theory used in the field of psychometrics. In IRT, mathematical models are applied to analyze data from tests or questionnaires in order to measure abilities and attitudes studied in psychometrics. An IRT model is a mathematical function that specifies the probability of a discrete outcome, such as a correct response to an item, in terms of person and item parameters. Person parameters may, for example, represent the ability of a student. Items may be questions that have incorrect and correct responses, or statements on questionnaires that allow respondents to indicate level of agreement, or any number of other questions. IRT may be used to evaluate how well assessments work, or how well individual questions on an assessment work, to evaluate the characteristics the test or questionnaire is designed to evaluate. In education and testing, IRT may be used to develop and refine exams, maintain banks of items for exams, and compare the difficulty of different versions of exams.
Mixture distribution models are a set of IRT models that assume that the observed item response data are sampled from a composite population, that is, a population that consists of a number of components or subpopulations. In contrast to multigroup models that assume a known partition of the population and use an observed grouping variable to model the item response data, mixture distribution models do not usually assume that the mixing variable is observed, but rather offer ways to collect evidence about this variable my means of model assumptions and observed heterogeneity in this data.
The components of a mixture distribution can be distinguished by the differences between the parameters of the assumed distribution model that governs the conditional distribution of the observed data. In the case of item response data, it may be assumed that different parameter sets hold in different subpopulations. Or, in even more heterogeneous mixtures, different item response models may hold in different subpopulations.
Several IRT models will hereinafter be discussed. Latent class analysis (LCA) is an older IRT model. The model equation of LCA is:
                              P          ⁡                      (                                          x                1                            ,              …              ⁢                                                          ,                              x                I                                      )                          =                              ∑                          c              =              1                        C                    ⁢                                    π              ⁡                              (                c                )                                      ⁢                                          ∏                                  i                  =                  1                                I                            ⁢                                                p                  ci                                ⁡                                  (                                      x                    i                                    )                                                                                        Equation        ⁢                                  ⁢        1            In Equation 1, x1, . . . ,xI represents the observed variables, c represents the (unobserved) mixing variable and may be interpreted as representing the indicator of populations, the θ variable is an (unobserved) latent variable that often represents ability, proficiency or other constructs in educational and psychological applications of latent variable models, such as item response theory. Let π(c) be the count density of c and the conditional density of θ given c is φ(θ|c). Finally, let the pci(x|θ) term denote the conditional probability of response x to item i given θ in population c. LCA is called a discrete model because it assumes the independence of response variables x1, . . . ,xI, so that:
                              p          ⁡                      (                                          x                1                            ,              …              ⁢                                                          ,                              x                I                                      )                          =                              ∏                          i              =              1                        I                    ⁢                                    p              ci                        ⁡                          (                              x                i                            )                                                          Equation        ⁢                                  ⁢        2            In the LCA, the conditional response probabilities are constrained to be the same for all members of a given class (subpopulation) c, but are allowed to differ across subpopulations.
It is often convenient to reparameterize the model equation in order to avoid the estimation bound probability-based parameters. A useful approach is to express the conditional probabilities as:
                                          p            ci                    ⁡                      (            x            )                          =                                            exp              ⁡                              (                                  β                  cix                                )                                                    1              +                                                ∑                                      y                    =                    1                                                        M                    i                                                  ⁢                                  exp                  ⁡                                      (                                          β                      ciy                                        )                                                                                =                      exp            ⁡                          (                                                β                  cix                                -                                  δ                  ci                                            )                                                          Equation        ⁢                                  ⁢        3            with
      δ    ci    =            ln      ⁡              [                  1          +                                    ∑                              y                =                1                                            M                i                                      ⁢                          exp              ⁡                              (                                  β                  ciy                                )                                                    ]              .  This reparameterization does not change the model assumpation, but makes it easier to estimate parameters when additional constraints are to be met. One constrained model derived from the unconstrained LCA in Equation 1 is of particular importance for the derivation of mixture IRT models. The central idea is to disentangle item effects and group effects which are confounded in the βcix parameters and therefore require a relatively set of parameters for each latent class. This means that many parameters have to be estimated in each class, and therefore, the accuracy of these estimates will deteriorate when more classes are added to an existing model. A more parsimonious approach can be taken when using a linear decomposition of βixc into item and class parameters such as βixc=x(biθc−αix) for decomposition into one class-level parameter and an item-category parameter, or βixc=x(biθc−ai) which yields a model that shares features with a 2PL IRT model (i.e., a two-parameter logistic IRT model). These linear compositions greatly reduce the number of necessary parameters. A practical advantage of the linear decomposed version of the LCA is that each latent class is assigned an “ability level” θc and each item has one parameter or set of parameters (i.e., aix or (ai, bi) in the examples above) that stay the same across latent classes.
The Mixed Rasch model (MRM) is another IRT model. The MRM was developed after different response styles and test taker strategies were observed. MRM was developed with the goal of integrating the quantitative ability differences into modeling student populations by means of qualitative differences (e.g., quality or strategy). The Rasch and LCA models share the local independence assumption, but use quite different variables upon which to condition. In the Rasch model, a count of correct responses is sufficient for estimating the ability parameter, whereas the LCA assumes that an unobserved nominal variable that is more or less implicitly defined by class profiles explains all observed dependencies in the data. The model equation for the mixed Rasch model is:
                                          P            ⁡                          (                                                x                  1                                ,                …                ⁢                                                                  ,                                  x                  I                                            )                                =                      π            r                          ⁣                              exp            ⁡                          [                                                ∑                                      i                    =                    1                                    I                                ⁢                                  β                  ixi                                            ]                                            γ            ⁡                          (              r              )                                                          Equation        ⁢                                  ⁢        4            where γ(r) represents the symmetric function of order r. This function is defined as:
                              γ          ⁡                      (            r            )                          =                              ∑                          {                                                                    (                                                                  x                        ⁢                                                                                                  ⁢                        1                                            ,                                                                                          ⁢                      …                      ⁢                                                                                          ,                      xI                                        )                                    ⁢                                      :                                    ⁢                                      ∑                    xi                                                  =                r                            }                                ⁢                      exp            ⁡                          [                                                ∑                                      i                    =                    1                                    I                                ⁢                                  β                  ixi                                            ]                                                          Equation        ⁢                                  ⁢        5            
The Rasch model in a conditional form may be augmented by including provisions to estimate the model parameters at subpopulation level and to combine these subpopulations in a discrete mixture distribution. Consistent with the above definition of a discrete mixture distribution we may define:
                                          P            ⁡                          (                                                x                  1                                ,                …                ⁢                                                                  ,                                                      x                    I                                    ❘                  c                                            )                                =                      π                          r              ❘              c                                      ⁣                              exp            ⁡                          [                                                ∑                                      i                    =                    1                                    I                                ⁢                                  β                  ixic                                            ]                                                          γ              c                        ⁡                          (              r              )                                                          Equation        ⁢                                  ⁢        6            Then, for the marginal probability of a response vector:
                                          P            ⁡                          (                                                x                  1                                ,                …                ⁢                                                                  ,                                  x                  I                                            )                                =                                    ∑                              c                =                1                            C                        ⁢                                          π                c                            ⁢                              π                                  r                  ❘                  c                                                                    ⁣                              exp            ⁡                          [                                                ∑                                      i                    =                    1                                    I                                ⁢                                  β                  ixic                                            ]                                                          γ              c                        ⁡                          (              r              )                                                          Equation        ⁢                                  ⁢        7            where πc denotes the mixing proportions for c=1, . . . ,C, and πr|c denotes the probabilities of latent score distribution for r=0, . . . ,Rmax. For dichotomous items, the maximum score is Rmax=I if I denotes the number of items. In the general case, let xi∈{0, . . . ,mi} be the i-th response variable, so that the maximum score for this item is mi. The maximum number of raw scores is obviously
      R    max    =            ∑              i        =        1            I        ⁢          m      i      in that case and, if estimated without restrictions, the number of parameters for the latent score also is Rmax per latent class c.
There are a large number of different polytomous mixed Rasch models known in the art, including Andrich's (1988) rating scale model, the equidistance model (Andrich, 1982), Masters' partial credit model (1982), and a model suggested by Rost that combines features of the rating scale model and the equidistance model, using conditional maximum likelihood techniques. Further, there is a log-linear Rasch model (Kelderman, 1984). Further, there are various HYBRID models, such as Yamamoto's (1987, 1989). The HYBRID models expand the parameter space differently than other models. In contrast to the common practice of assuming that the same type of model should be used in all mixture components, Yamamoto's HYBRID model allows different models in different components of the mixture. This is useful for, e.g., studying speededness.
However, existing diagnostic models are not suitable for use in analyzing large scale survey data, such as data obtained in from the National Assessment of Educational Progress (NAEP), International Adult Literacy and Life Skills Survey (IASL), Trends in International Mathematics and Science Study (TIMSS) and Programme for International Student Assessment (PISA) surveys. These large scale surveys do not report on an individual examinee level. They are primarily concerned with group level (e.g., limited English proficiency (LEP) individuals (sometimes referred to as English language learners, or ELLS), or by gender). The assessments use many different test-forms, with only partial overlap. As a result, there is a lot of data missing by design. Data may also go missing inadvertently.
One factor assisting an IRT theorist is that large scale surveys are typically accompanied by observable background data on the makeup of the respondents.
Thus, there is a need for system and method for effectively recovering item parameters and population parameters from sparsely populated data sets without unacceptable levels of estimation error.
It should be understood that the present invention is not limited to the preferred embodiments illustrated.