The present invention relates to methods for constructing multivariate predictive models to diagnose diseases for which current test methods are considered inadequate in either sensitivity or specificity. In particular, the present invention relates to predictive models for diagnosing diseases with a combination of laboratory tests, generating specificities of at least 80%.
More particularly, the present invention relates to the construction of a multivariate predictive model for diagnosing Lyme disease (LD) by choosing the best tests from among those currently available, utilizing the raw data produced by these tests instead of the manufacturers' binary test results, combining the test values into a single score through a special statistical function, weighting the importance of each component of the function when producing the score, generating a likelihood ratio from each patient's score, determining the pretest probability of disease through a special algorithm utilizing individual clinical signs and symptoms, combining the likelihood ratio with the pretest probability of disease through Bayes' Theorem to produce a posttest probability of disease, and determining a posttest probability cutoff point through a prospective validation study of the multivariate predictive model, against which individual patients' test results can be interpreted as indicative Lyme disease or not. The present invention also relates to component laboratory tests identified by the predictive model as critical for diagnosis in the form of test kits with the test panel components incorporated into a microtiter plate to be analyzed by a commercial laboratory.
Since the discovery that the spirochete Borrelia burgdorferi was the cause of LD over 25 years ago, numerous tests have been developed to detect this organism. Direct cultures of tissue or body fluids are possible, but suffer from low sensitivity. Direct detection methods involve assays for a component of B. burgdorferi or the DNA itself. Most PCR tests for B. burgdorferi DNA are insensitive, such as plasma, serum, whole blood, urine, and spinal fluid. Although invasive, arthrocentesis and skin biopsies often detect DNA by PCR in acute cases, aiding diagnosis. Performing skin biopsies is unnecessary under most circumstances because a well-trained physician can usually diagnose the characteristic rash, erythema migrans, by visual inspection alone.
Patients presenting with neurological symptoms or chronic arthritic symptoms will usually not benefit from PCR tests for B. burgdorferi DNA. In the latter cases, serological tests for antibody for B. burgdorferi are commonly used. Numerous methods have been employed, including whole-cell EIA, capture-EIA, peptide-antigen EIA, recombinant protein EIA, immunofluorescent antibody, immunodot, and immunoblots to detect IgG, IgM, and IgA antibodies. All serological methods may lead to false-positive results; however the most common test for B. burgdorferi anti-body, the whole-cell EIA, is particularly susceptible to false-positive results. Therefore the CDC has advised a two step process to confirm antibody: first test serum by whole-cell EIA or an equivalent method, then use a highly specific immunoblot to confirm those results positive or indeterminate by the first step.
Most antibody methods are insensitive early in the disease (<4 weeks), but become more sensitive after the first few weeks have passed. This lack of sensitivity for early disease and a high rate of false-positive serology have undermined public confidence in the two-step process. The CDC and NIH have conducted active research programs for better diagnostic tests. The most promising of these new tests have been the recombinant and peptide-antigen EIAs; these tests exhibit sensitivity and specificity similar to the prior two-step process, but embodied in a single test.
The concept of a single test is the most appealing and some experts have advocated using C6 IgG as an alternative to the two-step method. The lack of sensitivity in early disease persists (at least 40% false-negative rate) with this new generation of tests (including C6 IgG), leading to recommendations for alternative interpretive algorithms by some physicians and Lyme advocacy groups. Western immunoblots using alternative interpretive algorithms (Donta, Clin. Infect. Dis., 25 (Suppl. 1), S52-56 (1997)) have demonstrated better sensitivity, but much worse specificity (up to 40% false-positives). This trade-off between sensitivity and specificity is a well recognized limitation in diagnostic testing.
The use of multiple tests in combination is not new. The two-step algorithm is borrowed from the literature on syphilis and HIV testing: a sensitive but non-specific screening test is confirmed by a more specific test. Implicit in this paradigm is the knowledge that the second, confirmatory test is at least as sensitive as the screening test. This analogy breaks down for LD (Trevejo et al., J. Infect. Dis., 179(4), 931-8 (1999). The Western blot, though specific, is not as sensitive for early disease as the EIA test. The improved specificity of the two-step method is offset by limited sensitivity.
Tests are used in combination to gain either sensitivity or specificity; interpretive rules are usually generated through Boolean operators. If the “OR” operator is used, then a combination test is positive if either component is positive. If each component detects a different antigenic epitope of B. burgdorferi, then a test fashioned using the “OR” operator will likely be more sensitive than any individual component. However, each new component also has its own intrinsic rate of false-positive reactions. Overall false positive rates increase linearly when using the “OR” operator combinations (Porwancher, J. Clin. Microbiol. 41(6), 2791 (2003)). If the “AND” operator is used, then a test is positive only when both components are positive; this operator is used to improve the specificity of a given combination of tests, often at the expense of sensitivity.
When using the “AND” operator, a counterintuitive event may occur: additional antigens can be used to improve specificity without loss of sensitivity. This effect has been demonstrated for E1pB1 and OspE; when FlaB and OspC were added to the mix; requiring multiple antibody responses actually improved specificity from 89% to 98%, while maintaining sensitivity (Porwancher, J. Clin. Microbiol., (2003)). Sensitivity was maintained because there were 15 new ways for antibody combinations to form when two new antigens were added; patients with disease tend to have multiple positive antibody combinations. Specificity improved because false-positive combinations are rare, even though there are more ways for these to form.
Bacon et al., J. Infect. Dis., 187, 1187-1199 (2003) evaluated using two peptide or recombinant antigens together in binary form and assigned equal importance to antibodies generated by either antigen. The authors used the Boolean “OR” operator, evaluating several different antibody combinations and settled on two pairs of antibodies for diagnosis, either C6 IgG and pepC10 IgM or V1sE1 IgG and pepC10 IgM. While the 2-tier method using a VIDAS whole-cell EIA was included, no other recombinant antigens were evaluated. By limiting the choice of antigens and not weighting the ones that are included, this method compromises test performance.
Western blots are basically multiple binary test observations: a band is formed when antibody and antigen mix together in a clear electrophoretic gel, creating a visible line. Antibody is either observed or not. Of the 10 key antibodies detected by IgG Western blot, we do not know which antibody results contribute independent information to diagnosis. Nor is the information weighted according to its level of importance; all positive components are weighted the same. Failing to weight the importance of individual bands might have led to requiring an excessive number of bands to confirm disease, thus limiting sensitivity.
Honegr et al., Epidemiol. Mikrobiol. Immunol., 50(4), 147-156 (2001), interpreted Western blots using logistic regression analysis. While directed toward human diagnosis, the study tried to determine the optimal use of different species of B. burgdorferi to utilize in European tests, as well as determine interpretive criteria. Band results reported in binary fashion were used to create a quantitative rule; however, no likelihood ratios were reported from this regression technique, no partial ROC areas were maximized using the logistic method [as in McIntosh and Pepe (2002)], there were no specificity goals for ROC areas, and there was no attempt to utilize clinical information. While key Western blot bands were identified, and weighted, the failure to use clinical information, set specificity goals, or to maximize likelihood ratios (and therefore partial ROC areas) raises a question about the validity of the rules that were derived (according to the Neyman-Pearson Lemma).
Robertson et al., J. Clin. Microbiol., 38(6), 2097-2102 (2000), performed a study whose purpose was similar to Honegr et al. However Robertson et al. did not produce a quantitative rule as a consequence of utilizing multiple Western blot bands. While significant bands were identified through logistic regression, they utilized this information in a binary fashion and generated interpretive rules using either two or three of the bands so identified. There was no attempt to weight the importance of individual bands. In the end, the purported rules developed by logistic regression were no better than pre-existing interpretive criteria. No likelihood ratios were generated, no ROC curves, and no clinical information was utilized. There was no attempt to use the Western blot with other tests. Their failure to quantify their results severely limited its use.
Guerra et al., J. Clin. Microbiol., 38(7), 2628-2632 (2000), studied the use of log-likelihood analysis of Western blot data in dogs. The emphasis of her study was to develop a rule to diagnose Lyme disease in dogs that had received the Lyme disease vaccine (known to interfere with diagnosis). Guerra did produce a quantitative rule based on likelihood ratios. She combined this rule with epidemiological data to generate posttest probabilities. None of the animals were sick. No ROC analysis was performed, nor was there an attempt to determine the specificity or sensitivity of the technique. While a predictive rule could be generated, its performance was unclear because the epidemiological data was poorly utilized.
As demonstrated above, the LD field is limited by the lack of a theoretical basis for test strategy. There has been remarkably little work done using multivariate analysis and Lyme disease. Multiple tests exist to diagnose LD, but little is known about which tests are optimal or how to use tests together to enhance diagnostic power. U.S. Pat. No. 6,665,652 described an algorithm that enabled diagnosis of LD using multiple simultaneous immunoassays; this method required that the antibody response to antigens selected for diagnostic use be highly associated with LD (i.e. few false-positive results) and conditionally independent among controls. The disclosure of the above patent, particularly as it relates to LD diagnosis, is incorporated herein by reference.
Diagnostic methods are usually compared based on misclassification costs (utility loss), a value tied to the prevalence of LD in the general population. While the dollar cost of diagnostic tests is one means to compare outcomes, another and possibly more important goal is to estimate the loss of productive life (regret) from a given outcome. The two factors that generate regret are false-negative and false-positive serology.
The cost associated with false-negative results is the difference in regret between those with false-negative and true-positive serology, for which the increased personal, economic, and social cost of delaying disease treatment are factors. The cost associated with false-positive results is the difference in regret between those with false-positive and true-negative serology, for which the personal, economic, and social costs of administering the powerful intravenous antibiotics to healthy patients are all factors.
The foregoing issues also exist for many other infectious and non-infectious diseases. There remains a need for a predictive model that enables the selection of the fewest number of tests that contribute significantly to disease diagnosis, thereby limiting the cost of testing without sacrificing diagnostic sensitivity.