1. Field of the Invention
This invention relates to the field of psychometrics, the branch of psychology relating to the design, administration, and interpretation of test instruments (e.g., questionnaires) for assessing psychological variables (e.g., latent traits). Specifically, the invention relates to the dynamic application of probability models (e.g., probabilistic scoring) to responses to test items (e.g., questions) as a basis for assessing latent traits in a test subject.
2. Description of Related Art
Psychometrics relates to the theory and technique of psychological measurement, which can include the measurement of latent traits such as intelligence, abilities, attitudes, personality traits or psychiatric disorders. Psychometrics is primarily concerned with the construction and validation of measurement instruments, such as questionnaires and other types of tests that elicit responses from a test subject, upon which the measurement of the latent trait is based.
In psychometrics, item response theory (IRT) is a paradigm for the design, analysis and scoring of psychometric measurement instruments. The term “item” is used because while many test items may be questions that have incorrect and correct responses (e.g., multiple choice questions), test items may also include statements that allow test subjects to indicate a level of agreement or disagreement (e.g., on a Likert scale), or to indicate symptoms that are scored as present or absent. IRT is based on the concept that the probability of a particular response to a test item is a function of person parameters and test item parameters. The person parameter may be a latent trait; it may, for example, represent a person's intelligence, the strength of an attitude, or the presence and/or severity of a psychiatric disorder. Test item parameters can include, e.g., item difficulty and item discrimination.
IRT models the relationship between latent traits and responses to test items. Among other advantages, IRT can provide a basis for obtaining an estimate of the location of a test subject on a given latent trait (e.g., the severity of a psychiatric disorder), as well as the standard error of measurement of that location. A common way to represent a location of a test subject on a given latent trait is by computing an estimated severity score, which is useful for measuring change in the test subject over time (e.g., during treatment), and/or for categorizing the test-subject into sub-regions of the latent trait.
For example, a common psychometric test instrument used to measure depression in test subjects is the PHQ-9 self-report assessment questionnaire, which computes a severity score ranging from zero for “no depression” to 27 for “extreme depression”. Alternatively, the PHQ-9 can categorize depression into sub-regions of “Not Clinically Depressed” (a severity score of 0-6), “Sub-Threshold Depression” (a severity score of 7-9), or “Major Depression” (a severity score of 10 or more). See FIG. 1.
Validation of the results from a depression test instrument like the PHQ-9 would normally use a trained professional (e.g., a psychologist or psychiatrist) to independently assess a criterion measure of depression as an estimate of “ground truth” about depression in the test subject. If the criterion measure is collected at the time of the test instrument assessment then the validation is concurrent. If the criterion measure is collected at a later time then the validation is predictive. How accurately the instrument's computed result (e.g., major depression) matches the ground truth criterion is statistically expressed as the instrument's sensitivity and specificity.
An instrument's sensitivity is the percentage of time that when the criterion measure (e.g., a psychologist's assessment) finds the test subject located in a sub-region, the measurement instrument also finds the test subject located in that same sub-region. Stated another way, sensitivity can be used to express the likelihood of the occurrence of a “false-negative” where the condition goes undetected by the test instrument. For example, the PHQ-9 sensitivity for Major Depression is estimated to be 88%, meaning that on average out of 100 times that a psychologist would find a patient to have major depression, the PHQ-9 would detect 88 of those cases. The remaining 12 cases (false-negatives) would go undetected by the PHQ-9.
Conversely, a test instrument's specificity is the percentage of time that when the criterion measure finds the test subject is not located in a sub-region, the measurement instrument also finds the test subject is not located in that same sub-region. Stated another way, the specificity can be used to express the likelihood of the occurrence of a “false-positive” where the condition is misdiagnosed by the measurement instrument. For example, the PHQ-9 specificity for Major Depression is also estimated to be 88%, meaning that on average out of 100 times when a psychologist would find a patient to not have Major Depression, the PHQ-9 would agree for 88 of those cases. The remaining 12 cases (false-positives) would be detected as having Major Depression by the PHQ-9 even though a psychologist would not agree.
Many patients experiencing symptoms that might be associated with a psychiatric disorder initially seek treatment from a primary care physician. The PHQ-9 is commonly used to screen patients for depression in a primary care setting and the false-positive and false-negative findings can significantly impact the costs associated with patient care. For example, assume that a primary care physician screens 120 patients a week of which 18 patients (15%) have Major Depression and the remaining 102 patients do not have Major Depression. Screening with the PHQ-9 which has 88% sensitivity and 88% specificity, about 2 of the patients with Major Depression will go undetected (e.g., a false-negative) and about 12 of the patients without Major Depression will be detected as having Major Depression (e.g., a false-positive). The 12 false-positive patients will require additional and unnecessary diagnosis and treatment, while the 2 false-negative patients will go untreated, consuming additional medical resources especially when there are co-morbid chronic conditions, such as diabetes or cardiac risk. As another example, the Quick PsychoDiagnostic (QPD) Panel, another self-report measurement instrument, has 81% sensitivity and 96% specificity for Major Depression, and in the foregoing scenario would result in about 3 to 4 patients with Major Depression going undetected and about 4 patients without Major Depression being falsely diagnosed. Measurement accuracy has a significant impact on cost.
In addition to concerns with measurement accuracy, primary care physicians find many instruments too cumbersome and time consuming for routine use. The instruments take a significant amount of time to administer and score, and can therefore disrupt office routines and patient flow. These problems also arise when such instruments are used in an emergency room setting, e.g., for triage. Further, many instruments provide only numeric scores, not specific assessments that can better inform treatment decisions. Also, many such instruments test for one latent trait only (e.g., depression) and do not test for other psychiatric disorders that often coexist with depression and have implications for treatment of the patient.