The present invention relates generally to proteomics and more specifically to quantitative proteomics analysis.
Complete genomic sequences and large partial (EST) sequence databases can potentially allow the identification of every gene in a species. However, the sequences alone do not explain the mechanism of biological and clinical processes because neither the amount nor the activity of the protein products can be easily predicted from the gene sequence. From genomic analysis or the analysis of the expressed mRNA transcripts, neither the quantity nor the structure, activity and state of modification of the translated protein products can be predicted. Furthermore, the gene sequence alone cannot be used to reliably predict whether and how a gene will be spliced and how and at what position a protein is modified.
In order to assess the physiological state of a cell or organism using proteomics, it is important to understand the nature of protein modifications and the quantities of expressed proteins. As biological systems are dynamic, such technologies need to be quantitative. Such an analysis requires methods for the determination of the absolute quantity of each protein in a biological or clinical sample and for the determination of the precise composition of the proteins. This includes the determination of splice forms and modifications.
A number of approaches have been used to address the needs of proteomics analysis. For example, the combination of two-dimensional gel electrophoresis (2DE) and protein identification by mass spectrometry (MS) or tandem MS (MS/MS) constitute such a method. However, a limitation to this approach is that 2DE-MS analysis does not provide a true representation of the proteins in a biological sample because specific classes of proteins are known to be absent or under represented in 2D gel patterns. These include very acidic or basic proteins, excessively large or small proteins, membrane proteins and other proteins of poor solubility in aqueous solvents, and low abundance proteins.
Other methods for proteome analysis include quantitative mass spectrometry based on multidimensional peptide separation and isotope coded affinity tagging of proteins. This method allows relative quantitation, that is, the determination of the abundance ratio of each protein in two samples but does not allow determination of the absolute quantity of the proteins in a sample. Also, chip technology using arrays of reagents with known specificity for target proteins such as antibody arrays or arrays of aptamers can be used for proteomics analysis. However, the use of such arrays can be limited by the need to selectively capture representative proteins or preserve the three dimensional structure of the proteins depending on the particular use of the chip.
Mass spectrometry (MS) based methods for proteomics have in common that the currency of protein identification and quantification is a peptide generated by the sequence specific fragmentation of a protein. Therefore, proteins need to be enzymatically or chemically fragmented prior to mass spectrometric analysis. Furthermore, the MS based proteomic methods, alone or in conjunction with other methods, have in common that throughput is limited by the need to sequence each peptide in each sample in each experiment to determine the sequence identity of the protein analyzed. A protein generally generates a large number of peptides and hence a large number of peptides has to be sequenced per experiment. The yeast proteome is estimated to contain approximately 6000 open reading frames (ORF's), which would generate approximately 300,000 to 400,000 tryptic peptides, depending on how specifically the enzyme works to cleave the yeast proteins. Thus, a huge number of peptides would need to be analyzed for determination of the physiological state in a sample, even if only a subset of all possible genes is expressed in a cell at a given state.
Thus, there exists a need for methods of high throughput and quantitative proteome analysis. The present invention satisfies this need and provides related advantages as well.