This invention relates generally to proteome analysis and, more specifically, to methods of identifying and/or quantifying a protein or proteins that is contained in a mixture of proteins.
The classical biochemical approach to study biological processes has been based on the purification to homogeneity by sequential fractionation and assay cycles of the specific activities that constitute a process, the detailed structural, functional and regulatory analysis of each isolated component, and the reconstitution of the process from the isolated components. The Human Genome Project and other genome sequencing programs are turning out in rapid succession the complete genome sequences of specific species and, thus, in principle the amino acid sequence of every protein potentially encoded by that species. It is to be expected that this information resource unprecedented in the history of biology will enhance traditional research methods and catalyze progress in fundamentally different research paradigms, one of which is Proteomics.
Efforts to sequence the entire human genome along with the genomes of a number of other species have been extraordinarily successful. The genomes of 46 microbial species (TIGR Microbial Database; www.tigr.org) have been completed and the genomes of over one hundred twenty other microbial species are in the process of being sequenced. Additionally, the more complex genomes of eukaryotes, in particular those of the genetically well characterized unicellular organism Saccharomyces cerevisiae and the multicellular species Caenorhabditis elegans and Drosophila melanogaster have been sequenced completely. Furthermore, “draft sequence” of the rice genome has been published, and completion of the human and Arabidopsis genomes are imminent. Even in the absence of complete genomic sequences, rich DNA sequence databases have been made publicly available, including those containing over 2.1 million human and over 1.2 million murine expressed sequence tags (ESTs).
ESTs are stretches of approximately 300 to 500 contiguous nucleotides representing partial gene sequences that are being generated by systematic single pass sequencing of the clones in cDNA libraries. On the timescale of most biological processes, with the notable exception of evolution, the genomic DNA sequence can be viewed as static, and a genomic sequence database therefore represents an information resource akin to a library. Intensive efforts are underway to assign “function” to individual sequences in sequence databases. This is attempted by the computational analysis of linear sequence motifs or higher order structural motifs that indicate a statistically significant similarity of a sequence to a family of sequences with known function, or by other means such as comparison of homologous protein functions across species. Other methods have also been used to determine function of individual sequences, including experimental methods such as gene knockouts and suppression of gene expression using antisense nucleotide technology, which can be time consuming and in some cases still insufficient to allow assignment of a biological function to a polypeptide encoded by the sequence.
The proteome has been defined as the protein complement expressed by a genome. This somewhat restrictive definition implies a static nature of the proteome. In reality the proteome is highly dynamic since the types of expressed proteins, their abundance, state of modification, and subcellular locations are dependent on the physiological state of the cell or tissue. Therefore, the proteome can reflect a cellular state or the external conditions encountered by a cell, and proteome analysis can be viewed as a genome-wide assay to differentiate and study cellular states and to determine the molecular mechanisms that control them. Considering that the proteome of a differentiated cell is estimated to consist of thousands to tens of thousands of different types of proteins, with an estimated dynamic range of expression of at least 5 orders of magnitude, the prospects for proteome analysis appear daunting. However, the availability of DNA databases listing the sequence of every potentially expressed protein combined with rapid advances in technologies capable of identifying the proteins that are actually expressed now make proteomics a realistic proposition. Mass spectrometry is one of the essential legs on which current proteomics technology stands.
Quantitative proteomics is the systematic analysis of all proteins expressed by a cell or tissue with respect to their quantity and identity. The proteins expressed in a cell, tissue, biological fluid or portein complex at a given time precisely defines the state of the cell or tissue at that time. The quantitative and qualitative differences between protein profiles of the same cell type in different states can be used to understand the transitions between respective states. Traditionally, proteome analysis was performed using a combination of high resolution gel electrophoresis, in particular two-dimensional gel electrophoresis, to separate proteins and mass spectrometry to identify proteins. This approach is sequential and tedious, but more importantly is fundamentaly limited in that biologically important classes of proteins are essentially undetectable.
Thus, there exists a need for rapid, efficient, and cost effective methods proteome analysis. The present invention satisfies this need and provides related advantages as well.