Genomic studies are now approaching “industrial” speed and scale, thanks to advances in gene sequencing and the increasing availability of high-throughput methods for studying genes, the proteins they encode, and the pathways in which they are involved. The development of DNA microarrays has enabled massively parallel studies of gene expression as well as genomic DNA variations.
DNA microarrays have shown promise in advanced medical diagnostics. More specifically, several groups have shown that when the gene expression patterns of normal and diseased tissues are compared at the whole genome level, patterns of expression characteristic of the particular disease state can be observed. Bittner et al., (2000) Nature 406:536-540; Clark et al., (2000) Nature 406:532-535; Huang et al., (2001) Science 294:870-875; and Hughes et al., (2000) Cell 102:109-126. For example, tissue samples from patients with malignant forms of prostate cancer display a recognizably different pattern of mRNA expression to tissue samples from patients with a milder form of the disease. C.f., Dhanasekaran et al., (2001) Nature 412 (2001), pp. 822-826.
However, as James Watson pointed out recently proteins are really the “actors in biology” (“A Cast of Thousands” Nature Biotechnology March 2003). A more attractive approach would be to monitor key proteins directly. These might be biomarkers identified by DNA microarray analysis. In this case, the assay required might be relatively simple, examining only 5-10 proteins. Another approach would be to use an assay that detects hundreds or thousands of protein features, such as for the direct analysis of blood, sputum or urine samples, etc. It is reasonable to believe that the body would react in a specific way to a particular disease state and produce a distinct “biosignature” in a complex data set, such as the levels of 500 proteins in the blood. One could imagine that in the future a single blood test could be used to diagnose most conditions.
The motivation for the development of large-scale protein detection assays as basic research tools is different to that for their development for medical diagnostics. The utility of biosignatures is one aspect researchers desire in order to understand the molecular basis of cellular response to a particular genetic, physiological or environmental stimulus. DNA microarrays do a good job in this role, but detection of proteins would allow for more accurate determination of protein levels and, more importantly, could be designed to quantitate the presence of different splice variants or isoforms. These events, to which DNA microarrays are largely or completely blind, often have pronounced effects on protein activities.
This has sparked great interest in the development of devices such as protein-detecting microarrays (PDMs) to allow similar experiments to be done at the protein level, particularly in the development of devices capable of monitoring the levels of hundreds or thousands of proteins simultaneously.
Prior to the present invention, PDMs that even approach the complexity of DNA microarrays do not exist. There are several problems with the current approaches to massively parallel, e.g., cell-wide or proteome wide, protein detection. First, reagent generation is difficult: One needs to first isolate every individual target protein in order to isolate a detection agent against every protein in an organism and then develop detection agents against the purified protein. Since the number of proteins in the human organism is currently estimated to be about 30,000 this requires a lot of time (years) and resources. Furthermore, detection agents against native proteins have less defined specificity since it is a difficult task to know which part of the proteins the detection agents recognize. This problem causes considerable cross-reactivity of when multiple detection agents are arrayed together, making large-scale protein detection array difficult to construct. Second, current methods achieve poor coverage of all possible proteins in an organism. These methods typically include only the soluble proteins in biological samples. They often fail to distinguish splice variants, which are now appreciated as being ubiquitous. They exclude a large number of proteins that are bound in organellar and cellular membranes or are insoluble when the sample is processed for detection. Third, current methods are not general to all proteins or to all types of biological samples. Proteins vary quite widely in their chemical character. Groups of proteins require different processing conditions in order to keep them stably solubilized for detection. Any one condition may not suit all the proteins. Further, biological samples vary in their chemical character. Individual cells considered identical express different proteins over the course of their generation and ultimate death. Physiological fluids like urine and blood serum are relatively simple, but biopsy tissue samples are very complex. Different protocols need to be used to process each type of sample and achieve maximal solubilization and stabilization of proteins.
Current detection methods are either not effective over all proteins uniformly or cannot be highly multiplexed to enable simultaneous detection of a large number of proteins (e.g., >5,000). Optical detection methods would be most cost effective but suffer from lack of uniformity over different proteins. Proteins in a sample have to be labeled with dye molecules and the different chemical character of proteins leads to inconsistency in efficiency of labeling. Labels may also interfere with the interactions between the detection agents and the analyte protein leading to further errors in quantitation. Non-optical detection methods have been developed but are quite expensive in instrumentation and are very difficult to multiplex for parallel detection of even moderately large samples (e.g., >100 samples).
Another problem with current technologies is that they are burdened by intracellular life processes involving a complex web of protein complex formation, multiple enzymatic reactions altering protein structure, and protein conformational changes. These processes can mask or expose binding sites known to be present in a sample. For example, prostate specific antigen (PSA) is known to exist in serum in multiple forms including free (unbound) forms, e.g., pro-PSA, BPSA (BPH-associated free PSA), and complexed forms, e.g., PSA-ACT, PSA-A2M (PSA-alpha2-macroglobulin), and PSA-API (PSA-alpha1-protease inhibitor) (see Stephan C. et al. (2002) Urology 59:2-8). Similarly, Cyclin E is known to exist not only as a full length 50 kD protein, but also in five other low molecular weight forms ranging in size from 34 to 49 kD. In fact, the low molecular weight forms of cyclin E are believed to be more sensitive markers for breast cancer than the full length protein (see Keyomarsi K. et al. (2002) N. Eng. J. Med. 347(20):1566-1575).
Sample collection and handling prior to a detection assay may also affect the nature of proteins that are present in a sample and, thus, the ability to detect these proteins. As indicated by Evans M. J. et al. (2001) Clinical Biochemistry 34:107-112 and Zhang D. J. et al. (1998) Clinical Chemistry 44(6):1325-1333, standardizing immunoassays is difficult due to the variability in sample handling and protein stability in plasma or serum. For example, PSA sample handling, such as sample freezing, affects the stability and the relative levels of the different forms of PSA in the sample (Leinonen J, Stenman U H (2000) Tumour Biol. 21(1):46-53).
Finally, current technologies are burdened by the presence of autoantibodies which affect the outcome of immunoassays in unpredictable ways, e.g., by leading to analytical errors (Fitzmaurice T. F. et al. (1998) Clinical Chemistry 44(10):2212-2214).
These problems prompted the question whether it is even possible to standardize immunoassays for hetergenous protein antigens. (Stenman U-H. (2001) Immunoassay Standardization: Is it possible? Who is responsible? Who is capable? Clinical Chemistry 47 (5) 815-820). Thus, a great need exists in the art for efficient and simple methods of parallel detection of proteins that are expressed in a biological sample and, particularly, for methods that can overcome the imprecisions caused by the complexity of protein chemistry and for methods which can detect all or a majority of the proteins expressed in a given cell type at a given time, or for proteome-wide detection and quantitation of proteins expressed in biological samples.