Alzheimer's disease (AD) is the leading cause of dementia in the elderly (Cummings J L., “Cole G. Alzheimer Disease, JAMA 287:2335-2338 (2002)). Current antemortem methods of AD diagnosis correctly identify the disease in 80 to 90% of cases through the use of patient history, brain imaging, and neuropsychological testing at expert academic research centers (Kennard M., “Diagnostic Markers for Alzheimer's Disease,” Neurobiol Aging 19:131-132 (1998)) but the typical clinical diagnostic accuracy is probably lower. Typically, a diagnosis cannot be made until the disease has progressed far enough that dementia is present and, even then, the patient is classified as having possible AD or probable AD (McKhann et al., “Clinical Diagnosis of Alzheimers-Disease—Report of the NINCDS-ADRDA Work Group Under the Auspices of Department of Health and Human Services Task Force on Alzheimer's Disease,” Neurology 34:939-944 (1984)). Thus, a definitive diagnosis of AD currently requires a postmortem examination of the brain. A molecular biomarker for AD could complement current methods to increase the accuracy of diagnoses and make earlier diagnoses possible (Biomarkers Definitions Working Group, “Biomarkers and Surrogate Endpoints: Preferred Definitions and Conceptual Framework,” Clin Pharmacol Ther 69:89-95 (2001)). Many studies have examined the cerebrospinal fluid (CSF) as a possible source for biomarkers of neurological diseases, because CSF is in direct contact with the brain and the molecular composition of CSF can reflect biochemical changes in the brain (Fishman R., “Cerebrospinal Fluid in Diseases of the Nervous System,” 2ed New York: W.B. Saunders Co., (1992)).
In particular, there has been a focus on the proteins in CSF. Some AD CSF biomarker studies have focused on comparisons between AD and non-AD patients that are based on one (or a few) CSF proteins that have previously been determined to play a role in the pathogenesis of AD in the brain (Bonelli et al., “Cerebro Spinal Fluid Tissue Transglutaminase as a Biochemical Marker for Alzheimer's Disease,” Neurobiol Dis 11:106-110 (2002); Peskind et al., “Cerebrospinal Fluid SIO0B is Elevated in the Earlier Stages of Alzheimer's Disease,” Neurochem Int 39:409-413 (2001); Hampel et al., “Discriminant Power of Combined Cerebrospinal Fluid Tau Protein and of the Soluble Interleukin-6 Receptor Complex in the Diagnosis of Alzheimer's Disease,” Brain Res 823:104-112 (1999)). While this approach is useful for testing proposed biomarkers, it does not allow for new biomarker discovery. To complement this approach, other studies have compared the entire proteome of CSF between AD and non-AD patients to look for differences in protein expression. The proteome is defined as the protein complement to the genome and includes information about the proteins and peptides present, and their expression levels. Previous studies have examined the CSF proteome for AD biomarkers using several different techniques (Carrette et al., “A Panel of Cerebrospinal Fluid Potential Biomarkers for the Diagnosis of Alzheimer's Disease,” Proteomics 3:1486-1494 (2003); Davidsson et al., “Proteome Analysis of Cerebrospinal Fluid Proteins in Alzheimer Patients,” Neuroreport 13:611-615 (2002); Puchades et al., “Proteomic Studies of Potential Cerebrospinal Fluid Protein Markers for Alzheimer's Disease,” Brain Res Mol Brain Res 118:140-146 (2003); Choe et al., “Studies of Potential Cerebrospinal Fluid Molecular Markers for Alzheimer's Disease,” Electrophoresis 23:2247-2251 (2002)). Carrette et al. (Carrette et al., “A Panel of Cerebrospinal Fluid Potential Biomarkers for the Diagnosis of Alzheimer's Disease,” Proteomics 3:1486-1494 (2003)) used surface-enhanced laser desorption/ionization (SELDI) coupled with time-of-flight (TOF) mass spectrometry (MS) to characterize the antemortem CSF proteome of nine AD patients and ten non-AD patients. The AD patients had a diagnosis of probable AD (no postmortem confirmation of diagnoses) and the non-AD patients were normal controls. The data in the Carrette study were analyzed using a Mann-Whitney U statistical test and a panel of five polypeptides were identified that could classify AD patients with a specificity of 100% and a sensitivity of 66%. In a study by Davidsson et al. (Davidsson et al., “Proteome Analysis of Cerebrospinal Fluid Proteins in Alzheimer Patients,” Neuroreport 13:611-615 (2002)), proteins were separated by two-dimensional gel electrophoresis (2DE). The protein spots on the gel images from 15 AD patients (no postmortem confirmation of diagnoses) and 12 normal controls were compared using a Mann-Whitney U test. Fifteen protein isoforms were found to have a significant (p<0.05) change in their CSF concentration. In the study by Puchades et al. (Puchades et al., “Proteomic Studies of Potential Cerebrospinal Fluid Protein Markers for Alzheimer's Disease,” Brain Res Mol Brain Res 118:140-146 (2003)), 2DE was also used to compare samples from seven AD patients (no postmortem confirmation of diagnoses) and seven normal controls using a Students t-test. Nine proteins were found to be significantly (p<0.05) altered between the AD and non-AD patients. Choe et al (Choe et al., “Studies of Potential Cerebrospinal Fluid Molecular Markers for Alzheimer's Disease,” Electrophoresis 23:2247-2251 (2002)) applied multivariate statistical methods to analyze 2DE gels from ten AD patients (diagnoses confirmed postmortem), five neurologically normal patients, and two patients with Creutzfeldt-Jakob disease (CJD). Using a canonical correlation analysis, a set of nine proteins was found that could differentiate between AD and normal patients with 100% sensitivity and specificity. Using a principle factor analysis on a subset often patients (four AD, four normal, and two CJD), they found a set of 12 spots that had a sensitivity of 100% and a specificity of 83%. These studies have generated interesting preliminary data; however there are several considerations that are not completely addressed in any of the previously published work. First, antemortem CSF samples should be used and the antemortem diagnosis of AD patients should be confirmed by an autopsy. Second, neurological controls should be included in the non-AD samples. Third, a reasonably large number of CSF samples should be used and multivariate statistics should be considered for the data analysis.
An important factor in biomarker studies is the use of appropriate samples. Antemortem samples should be used for CSF biomarker studies, because there is a change in the CSF protein composition after death (Lescuyer et al., “Identification of Post-Mortem Cerebrospinal Fluid Proteins as Potential Biomarkers of Ischemia and Neurodegeneration,” Proteomics 4:2234-2241 (2004)). The use of antemortem CSF samples from AD patients with a definitive postmortem confirmation of AD diagnosis is essential given that a significant fraction of antemortem AD diagnoses are incorrect (Kennard M., “Diagnostic Markers for Alzheimer's Disease,” Neurobiol Aging 19:131-132 (1998)). Inclusion of incorrectly diagnosed patients would affect the reliability of the biomarkers' predicted sensitivity and specificity.
A second key element is the selection of control samples. Although a comparison of AD and normal CSF may result in the identification of biomarkers, the inclusion of neurological controls is essential for the development of clinically relevant tests. Many characteristics of AD (e.g. inflammation, memory loss, etc.) can be common to other forms of dementia and the key clinical challenge is to establish a differential diagnosis. For example, some changes in protein expression, which may be useful in segregating AD from normal, may not be useful in segregating AD from other dementias.
A third consideration is the desire to identify and validate markers from a cohort of reasonable size to better establish the statistical power of the identified markers. Prior AD proteomic studies have used between 10 and 27 total CSF samples. The larger the sample set, the more likely that the results of the statistical analysis represent the larger population. Nonetheless, the results from any preliminary dataset, including the one presented herein using 68 samples, must be validated by multiple investigators using large numbers of samples.
Finally, it is important to consider the application of appropriate multivariate statistical methods in the identification of biomarkers. AD is a complex disease and a multivariate statistical approach can result in biomarkers that better represent the disease's multifactorial nature. Many previous studies (Carrette et al., “A Panel of Cerebrospinal Fluid Potential Biomarkers for the Diagnosis of Alzheimer's Disease,” Proteomics 3:1486-1494 (2003); Davidsson et al., “Proteome Analysis of Cerebrospinal Fluid Proteins in Alzheimer Patients,” Neuroreport 13:611-615 (2002); Puchades et al., “Proteomic Studies of Potential Cerebrospinal Fluid Protein Markers for Alzheimer's Disease,” Brain Res Mol Brain Res 118:140-146 (2003)) have used univariate statistical methods to determine which proteins show a change in concentration between the diseased and normal states. Univariate methods assume that any single observed change in protein expression between diseased and normal patients is independent of other protein changes. Thus, these methods cannot take interactions among proteins or biochemical pathways into account. Multivariate statistical methods do not rely on variable independence and can be used to combine information from multiple variables to improve disease diagnosis (Harris R J., “A Primer of Multivariate Statistics,” 3ed. Mahwah, N.J.: Lawrence Erlbaum Associates (2001)). The importance of combining information from multiple variables has already been demonstrated in AD biomarker research. Using CSF expression levels of both Aβ1-42 and tau results in a higher sensitivity and specificity for AD diagnosis as compared to using either protein alone (Blennow K., “Cerebrospinal Fluid Protein Biomarkers for Alzheimer's Disease,” Neurorx 1:213-225 (2004)).
One challenge in the application of proteomic analyses for AD biomarker studies is that such studies are often underspecified—there are significantly more variables than samples (i.e. more proteins than CSF samples). This situation restricts many multivariate statistical methods from being appropriately applied to proteomic data. In 2001, Brieman introduced a method for multivariate statistical analysis, the random forest (RF) method, that is based on classification trees (Breiman L., “Random Forests,” Machine Learning 45:5-32 (2001)). The RF method can be used to analyze underspecified systems and, unlike some other multivariate methods (such as support vector machines or artificial neural networks), it can be used even when a large number of the variables are irrelevant to the classification of the samples (Izmirlian G., “Application of the Random Forest Classification Algorithm to a SELDI-TOF Proteomics Study in the Setting of a Cancer Prevention Trial,” Acad Sci 1020:154-174 (2004)). This is important since only a small percentage of proteins may show an expression change in response to a disease. There is also a smaller effect from noise in the variables with an RF analysis compared to some other methods, because the RF method does not concentrate weight on any subset of samples (Breiman L., “Random Forests,” Machine Learning 45:5-32 (2001)). Another feature of RF is the method's ability to measure the importance of individual variables in sample classification. This is especially relevant to proteomic systems where, as mentioned before, a large percentage of the variables may not show a change in expression. Identifying which proteins are most important in sample classification may give insight into the biology of the system (i.e. what pathways is the disease affecting) or even allow the development of an antibody-based assay for sample classification.
This new statistical method has been applied to a variety of biological studies including the analysis of protein data related to cancer diagnosis (Izmirlian G., “Application of the Random Forest Classification Algorithm to a SELDI-TOF Proteomics Study in the Setting of a Cancer Prevention Trial,” Acad Sci 1020:154-174 (2004)) and the determination of gene mutations that lead to antibiotic resistance (Cummings M P., “Few Amino Acid Positions in rpoB are Associated with Most of the Rifampin Resistance in Mycobacterium Tuberculosis,” BMC Bioinformatics 5:157 (2004)). RF was compared to several other multivariate statistical methods, including linear discriminant analysis, k-nearest neighbor, and support vector machines, for determining biomarkers of ovarian cancer based on the protein mass spectra of serum (Wu et al., “Comparison of Statistical Methods for Classification of Ovarian Cancer Using Mass Spectrometry Data,” BMC Bioinformatics 19:1636-1643 (2003)). The authors found that the RF method resulted in a lower overall misclassification rate of serum samples and a more stable assessment of classification errors.
The present invention is directed to overcoming these and other deficiencies in the art.