A proteome is defined as the totality of all the proteins of one cell type under precisely defined boundary conditions. Because higher life forms contain several hundred types of cells, there are also hundreds of proteomes. At the same time, there are proteins that are common to all the cell types of the life-form (housekeeping proteins), and those that are specific to one type of cell. The proteome, moreover, is not unchangeable, being modified both qualitatively and quantitatively with boundary conditions such as age, or stress on the cell community resulting from the administration of medication.
Of special interest, of course, are the proteins of a proteome that are not yet known, both for their application as pharmaceutical target proteins and also as possible independent active substances, i.e., proteins suitable for pharmaceutical use. (Insulin provides one example of a protein suitable for pharmaceutical use; there are, however, many other examples.) Those proteins that may be suitable as active substances are in most cases only present in very small concentrations, and frequently escape the classic methods of proteome analysis.
Also of great value for understanding the function of cell communities are those proteins whose quantity changes when the cell community is stressed, such as through age, the administration of medicine, or diseases.
It is estimated that mammals possess well over 100,000 proteins, whose structural plans are to be found in somewhere between 30,000 to 40,000 genes. There are estimates which indicate that from one gene alone, the process of “splicing” gives rise, as a statistical average, to about three and a half different types of protein; on top of this, many more proteins are created through post-translational modifications. A proteome contains from some thousands up to some tens of thousands of proteins. Not even half the human proteins are known today.
Current analytic procedures for the proteins of a proteome are generally based on separating the dissolved proteins by 2D gel electrophoresis, punching out the dyed proteins, enzymatic digestion in gel chips, followed by MALDI mass spectrometry of the digested peptides in time-of-flight mass spectrometers, permitting both the precise masses of the digestion peptides to be obtained, as well as the daughter ion spectra of the digestion peptides in a rather complex process using what is known as the PSD (post source decay) method. The precise masses of the digestion peptides allow the proteins to be found in protein sequence databases, assuming that they are included in the database. If the identification is ambiguous, daughter ion spectra from individual digestion peptides can be exploited. If the protein is not contained in the protein sequence database, it is also possible to search in EST data (expressed sequence tags) that has been obtained from RNA, in cDNA data or in the DNA data of the genome.
This procedure has the advantage that the association of the digestion peptide with a protein is guaranteed by the procedure itself, at least in cases where the separation by 2D gel electrophoresis was of sufficiently good quality. However, from one protein, generally somewhere between 10 and 70 percent of the sequence is covered by the digestion peptides; in most cases rather under 50 percent. This is referred to as coverage. If the protein is contained in the database, then, as has already been described, knowledge of the precise masses of some digestion peptides is often sufficient for identification; in the case of ambiguous results, which most often occur when the mass determination is insufficiently precise, then an additional daughter ion spectrum of the peptide, characterizing its sequence of amino acids, yields certain identification.
It is perfectly possible for several thousand spots to be dyed and found in the 2D gel, although it is then found in the course of analysis that only at most about a thousand (and in most cases only a few hundred) different proteins can be analytically found in a proteome using this procedure. A proteome, however, is expected to include many times this number of proteins.
Another analytic procedure that has been introduced involves the analysis of mixtures of a few proteins by the digestion of all the proteins of this mixture, liquid chromatographic separation of the digestion peptides, ionization by electrospraying (ESI) and automatic MS/MS procedures for peptide structure determination in ion trap mass spectrometers or quadrupole-quadrupole time-of-flight mass spectrometers.
This common digestion of the proteins and the liquid chromatographic separation mean that the association of peptides with one protein is no longer given by the analytic procedure, and the association of various digestion peptides with a protein can only be made by the database search. Very good programs have now been developed for searching the databases and for searching for the peptides associated with a protein.
This procedure of real-time LC/MS analysis runs in ion trap mass spectrometers or in time-of-flight spectrometers with orthogonal injection and with preliminary separation and fragmentation in upstream quadrupole filters. These devices have a recording time for daughter ion spectra of somewhere between one and two seconds. It is therefore only possible to record at most five, but most often significantly fewer, different daughter ion spectra in a high resolution liquid chromatogram with a peak width of about 10 seconds. These procedures are therefore restricted to protein mixtures of low complexity. Mixtures with around five or ten proteins can be effectively analyzed, but more complex mixtures, such as an entire proteome with several tens of thousands of proteins, or even just part of a proteome with a few thousand proteins cannot be analyzed in this way. The applicability of frequently employed the procedure of real-time LC/MS analysis is restricted by the time pressure resulting from the chromatography. Even what are known as “stop-flow” methods are only of limited help, as they impair the separation capacity of the chromatography.