Genomics is the study of the collective set of genes (the genome) of a species, as well as study of the function and activity of those genes, in different cells and in the same cell, temporally, developmentally, and under varying environmental conditions. Differential gene function and activity plays a significant role in the development of a cell for a specialized activity in the body and the transformation of a cell from healthy into pathologic.
The expression of genetic information in a cell is carried out through the transcription of an intermediate molecule, mRNA. The cell translates expressed mRNAs into polypeptides, or proteins. Proteins carry out the majority of functions encoded by the genes. The study of the collective set of proteins (the proteome) of a species, and the activity and function of those proteins in a cell is the subject of a new field of biology called “proteomics.”
Because the character of a cell depends on the genes expressed by the cell, gene expression profiling has become an important method in genomics. Gene expression profiling seeks to determine which genes are expressed in a cell and the level of their expression. Thus, the gene expression profile of a cell provides a “fingerprint” that is characteristic of the cell, indicating both the identity of the cell and its activity. Comparing the gene expression profiles of different cells is a process called “differential gene expression.” This method can provide information about the genes that are responsible for the different phenotypes of cells. Genes that are differentially expressed in healthy and pathologic cells can function as diagnostic markers and are candidate targets for therapeutic intervention. Thus, obtaining accurate profiles of gene expression in different cell types is an important goal.
There are numerous methods presently used to generate gene expression profiles of a cell. These methods include traditional methods such as northern blots, RT-PCR, nuclease protection, differential display, cDNA fingerprinting, and subtractive hybridization, as well a newer techniques such as the generation of expressed sequence tag, or “EST” libraries and arrays, cDNA arrays, mRNA arrays, oligonucleotide arrays, and serial analysis of gene expression, or “SAGE” (see generally Lockhar & Winzeler, Nature 405:827-836 (2000); see also Velculescu et al., Science 270:484-487 (1995)).
In one example, nucleic acid arrays such as oligonucleotide arrays are used for expression profiling. These arrays are collections of specifically chosen oligonucleotides that are bound to a solid support at predetermined and addressable locations. In certain embodiments, these arrays comprise an oligonucleotide that specifically identifies each of the known genes in a genome. Messenger RNAs or cDNAs derived from a cell are applied to the array. Each mRNA or cDNA hybridizes with an oligonucleotide that corresponds to the particular gene from which it was transcribed. Because the identity and location of each immobilized oligonucleotide is predetermined, each hybridization event indicates that a particular gene has been expressed by the cell. One commercialized version of an oligonucleotide array is the GeneChip™ from Affymetrix. In yet another example of commercialized array methodology, beads coated with an array, or cells, are each attached to an optical sensor molecule. To provide an address, the beads are then drawn into wells at the end of fibers in a fiber optic bundle (see, e.g., Bead Array™ (Illumina)). In yet another example, arrays can be made from EST libraries. EST libraries are generated by reverse-transcribing the set of expressed mRNA in a cell. Frequently, the entire mRNA is not reverse transcribed, but a sufficient portion of it is to uniquely identify the gene from which the mRNA was expressed. The ESTs are sequenced and identified in a genomic database.
Despite the power of existing gene expression technologies, it is acknowledged that levels of mRNA transcription do not always correlate directly to levels of protein expression, for a number of reasons: (1) different mRNAs may be translated into polypeptides with different efficiencies; (2) an mRNA may be differentially spliced to produce different proteins in different cells; (3) expressed polypeptides may be degraded at different rates; and (4) polypeptides can be subject to post-translational modifications so that the same polypeptide can assume a different form or function in the same cell and in different cells. Thus, there is a need to correlate mRNA expression with protein expression (see, e.g., Hancock et al., Anal. Chem. News & Features, Nov. 1, 1999, page 742A-748A; Nelson et al., Electrophoresis 21:1823-1831 (2000)).
At the same time, current methods of protein expression profiling, such as mass spectrometry, 2D gel electrophoresis, and chromatography, may suffer from limitations in sensitivity and resolution (see, e.g., Pandey & Mann, Nature 405:837-846 (2000)). The present invention therefore address this issue by combining gene expression profiling and protein profiling to more quickly and accurately identify proteins of interest in a particular cell type. Gene expression profiling is used to select a candidate transcript or transcripts that are expressed in a cell. The transcripts are typically sequenced and used to deduce the amino acid sequence of the encoded protein. The amino acid sequence is then used to predict and identify physio-chemical characteristics of the protein encoded by transcript, e.g., molecular weight, isoelectric point, hydrophobicity, hydrophilicity, glycosylation, phosphorylation, epitope sequence, ligand binding sequence, charge at specified pH, or metal chelate binding. The physio-chemical characteristics are then employed to improve the sensitivity and resolution of protein profiling, thereby providing improved information about the proteins encoded by mRNA expressed in a particular cell type. This invention provides methods for making such a correlation and provides other advantages, as well.