This invention relates to the analysis of complex protein mixtures, such as whole proteomes, by joint enzymatic digestion, followed by chromatographic or electrophoretic separation and mass spectrometric analysis of the digest peptides. A proteome is defined as the community of all the proteins of a species (e.g. human proteome; mouse proteome), of an organ (brain, liver, blood plasma proteome), of a type of cell (cell proteome) or even a type of organelle (organelle proteome). Since there are hundreds of cell types in a higher organism (about 230 human cell types), there are also hundreds of cell proteomes. There are proteins which are common to all cells of the organism (housekeeping proteins), and those which are specific to one cell type. Moreover, a proteome is not constant in its composition; it changes qualitatively and quantitatively with age, state of health or stress of an organism, e.g. stress brought about by the administration of medication or stress caused by a tumor. Unusual over-expressions and under-expressions of certain proteins can provide information on the stress.
Naturally, hitherto unknown proteins of a proteome are of special interest. In the area of pharmacology, for instance, they are of interest both for their use as pharmaceutical target proteins (targets) and also as active substances, which may be used as pharmaceutical products. Examples of proteins used as pharmaceutical products are insulin and estrogen; there are hundreds of other examples. The proteins, which are active substances like enzymes, are mainly present only in very low concentrations and often escape the classical method of proteome analysis. By those proteins whose quantity changes by over-expression or under-expression due to stress on the cell community, valuable information is provided about the functioning of the cells.
Mammals are estimated to have by far more than 100,000 proteins, whose basic blueprints can be found in around 10,000 to 30,000 genes (at present knowledge, the human genome comprises 20,300 genes). There are estimates that so-called “alternative splicing” can produce, on the statistical average, around three and a half different types of proteins from one single gene; in addition to this, many more proteins result from post-translational modifications (PTM) like shortening or lengthening the protein, methylations, phosphorylations, glycosylations, formation of lipoproteins and many others. A cell proteome may contain a few thousand to a few ten thousand proteins. At present, there are estimates that not even half of the human proteins are known.
In order to jointly analyze as many proteins of complete proteomes as possible there are essentially two different approaches: “top-down” or “bottom-up”. In the top-down method, the proteins are first chromatographically or electrophoretically separated and only then fragmented (for example by enzymatic digestion, or by types of fragmentation commonly used in mass spectrometers, such as collision-induced fragmentation or multi-photon absorption) in order to analyze the fragment peptides mass spectrometrically. If the fragment peptides belonging to a protein are known in advance; an accurate mass determination of the fragment peptides is then usually sufficient to identify the protein with the aid of databases. In the bottom-up method, in contrast, a mixture containing all the proteins is enzymatically digested jointly; a daughter ion spectrum of each digest peptide must then be measured in order to identify every single digest peptide by recognition of parts of its amino acid sequence and to assign it to a protein. In this method, the digest peptides are usually separated by liquid chromatography. The term “daughter ion mass spectrum” means a mass spectrum of the fragment ions of a selected ion species; the ions of an ion species selected for the fragmentation are usually called “parent ions”.
A frequently used top-down analytical method for the proteins of a proteome is essentially based on the separation of the dissolved proteins by 2D-gel electrophoresis, staining the proteins, punching out little gel pieces with stained proteins, de-staining, enzymatic digestion within the piece of gel, and subsequent MALDI mass spectrometric investigation of the digest peptides in time-of-flight mass spectrometers, whereby precise masses of the digest peptides as well as daughter ion spectra of the digest peptides can be obtained. If the proteins are present in protein sequence databases, they can be found via the precise masses of the digest peptides. If the identification is ambiguous, daughter ion spectra of individual digest peptides can be used for confirmations. If the protein is not present in the protein sequence database, it is possible to search in EST databases (Expressed Sequence Tags) which have been obtained from RNA, in cDNA data or in “open reading frames” of the DNA data of the genome.
This method has the advantage that the protein to which a digest peptide belongs is known in advance, at least if the separation by 2D-gel electrophoresis was sufficiently good. As a rule, only 10 to around 70 percent of the sequence of a protein, in most cases below 50 percent, is covered by digest peptides. This is called “coverage”. If the protein is present in the database, an identification often only requires knowledge of the precise masses of several digest peptides, as has been stated before; if the results are ambiguous, which frequently occurs when the mass determination is not accurate enough, an additional daughter ion spectrum of a peptide, which reflects at least parts of its amino acid sequence, leads to a certain identification.
In the 2D-gel it is quite common that several thousand spots are stained and found, although it usually turns out during the analyses that only a few hundred different proteins are analytically found in one proteome with this method. However, a proteome is expected to comprise many times this number of proteins.
An analytical bottom-up method basically performs the analysis of mixtures of proteins by the joint digestion of all the proteins of this mixture, liquid chromatographic (LC) separation of the digest peptides, electrospray ionization (ESI) and automatic methods for the acquisition of daughter ion spectra to determine at least parts of the amino acid sequence of the digest peptides in tandem mass spectrometers (MS/MS). When this joint digestion of the proteins and liquid chromatographic separation method is used, information concerning the protein to which a peptide belongs is no longer provided by the analytical method per se. In this case the protein to which different digest peptides belong can only be determined with the aid of daughter ion spectra and searches in databases. Excellent computer programs have been developed for searching the databases and for collating the peptides which form a protein.
This method of real time LC/MS/MS analysis is performed, for example, in RF ion trap mass spectrometers or in time-of-flight spectrometers with orthogonal ion injection and prior separation and fragmentation in upstream quadrupole filters (Q-OTOF). These instruments have a total acquisition time for a primary mass spectrum and subsequent daughter ion spectra of around half a second or even less. In a high-resolution liquid chromatogram, a maximum of about twenty to thirty different daughter ion spectra can be acquired within a chromatographic peak of around ten seconds width at half height. In a chromatography run of three hours, this means a maximum of 20,000 to 30,000 daughter ion spectra, if in fact so many digest peptides are detected in the primary mass spectra. Usually, however, this is not the case. For one proteome, usually only a few thousand digest peptides are found above the detection limit in the primary spectra. This means that around 500 to 1,000 proteins can be identified, in best cases around 1,500 proteins using mass spectrometers of highest sensitivity. At present, this seems to be a kind of magic limit, in spite of the fact that a proteome should show many times this number of proteins.
A different bottom-up method for the mass spectrometric analysis of a complex protein mixture is described in DE 101 58 860 B4 (D. Suckau et al., 2001). It comprises the following steps: a) joint enzymatic digestion of all the proteins in the protein mixture, b) liquid chromatographic separation of the digest peptides in the mixture, c) capture of several hundred fractions of the chromatographic eluent, each on a sample site of a sample support which is coated in advance with matrix substance, d) acquisition of mass spectra and daughter ion mass spectra with ionization by matrix-assisted laser desorption (MALDI) in suitable time-of-flight mass spectrometers, and e) identification of the associated proteins by searching in protein sequence, EST, cDNA or DNA databases. This method has the advantage of acquiring as many daughter ion spectra as there are peptides in a sample up to the complete consumption of the sample, and can find around five times more proteins than the 2D-gel electrophoresis method, but still is also limited to about 500 to 1,500 proteins.
The publication “Precursor Acquisition Independent from Ion Count: How to Dive Deeper into the Proteomics Ocean” by A. Panchaud et al., Anal. Chem. 2009, 81, 6481-6488 has elucidated a further bottom-up method which was carried out in RF ion trap mass spectrometers and which makes it possible to find far more digest peptides than with any previous method. The authors gave the method the acronym “PACIFIC” (Precursor Acquisition Independent From Ion Count).
The method is based on the long-recognized observation that, with RF ion trap mass spectrometers, the detection limits for daughter ions are significantly lower than those for the measurement of the unfragmented primary ions. If, in a primary spectrum, no signal is detected above the background noise at one mass, but an ion species is assumed to be present at that mass, it is possible to fill the ion trap to a very high level with ions, then remove all ions apart from the ones that are assumed to be there, fragment the assumed ions and acquire the daughter ion spectrum of the assumed ions. It can be shown that the detection limits for daughter ion spectra acquired in such a way are up to a factor of a hundred lower than those of the primary spectra. This observation can be used to search “blindly” for peptide ions which are far below the detection limit.
The method of A. Panchaud et al. thus blindly isolates mass ranges each measuring around 2.5 dalton, fragments the ions by collisions with residual gas molecules and measures the daughter ion spectrum. Cyclic repetition of eleven of these daughter ion measurements covers a mass range of 13.5 dalton, with overlaps which were inserted as a precaution, and takes around three seconds; the whole HPLC run is now scanned with this repeat cycle; several cycles are passed through for each HPLC peak of around 10 seconds full width at half-maximum. In consecutive HPLC runs, further new mass ranges, each of 13.5 mass units, are now scanned so that, after 67 HPLC runs, each of three hours duration, daughter ion mass spectra within the mass range from 400 to 1,400 dalton are obtained. This method was used for bacteria to determine far more than 2,000 of the proteins from around 50,000 measured digest peptides in 4.2 days. For human plasma, more than twice the number of proteins was determined than with methods used up to now.
As advantageous as this method may be with respect to the detectability of proteins, the time of several days required is a disadvantage for routine applications.
In the article “Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra” by J. D. Venable et al. (Nature Methods, Vol. 1, No. 1, 2004), a similar method is described, but a broader width of 10 to 15 dalton is used for the isolation window, the ions are fragmented, and the daughter ions of peptides of a mixture of two proteomes with isotopically marked proteins are compared quantitatively.
The methods described are carried out in mass spectrometers containing RF ion traps and are based essentially on the special characteristics of these RF ion traps. In principle, both two-dimensional (linear) and three-dimensional ion traps can be used. As those skilled in the art are aware, the ions are kept in these ion traps by so-called pseudopotentials, and the effect of the pseudopotentials on the ions is inversely proportional to their mass-to-charge ratio m/z. No ions can be stored in the ion trap below a cut-off mass, which can be set via the RF voltage. In the ion trap, the lightest ions above the cut-off mass (m/z)lim collect in the center, the heavier ones are further toward the outside, the heavier they are, because the space charge drives the heavier ions further out against the pseudopotential, which has a weaker effect on them. This type of ion trap can be filled with around 107 to 108 ions in total; further filling causes heavier ions to be increasingly lost and lighter ions to be enriched. A mass spectrum cannot be acquired with an RF ion trap filled with such high numbers of ions, however, because the space charge hinders the mechanism that ejects separate ions. Modern RF ion trap mass spectrometers can provide a qualitatively good mass spectrum with only around 10,000 to 50,000 ions at maximum, but then with a resolution which even makes it possible to clearly recognize the isotopic pattern of quadruply or even quintuply charged ions. Mass spectrometers of this type with mass ranges up to m/z=3,000 dalton are commercially available.
To acquire qualitatively good daughter ion spectra of selected parent ions, it is advantageous to initially fill the ion trap with so many parent ions that, after their isolation and fragmentation, sufficient daughter ions still remain for a good daughter ion spectrum. This can often only be achieved by first greatly overfilling the ion trap with ions, for instance, with 106 or even 107 ions, depending on the concentration of the parent ions in the mixture of ions, and then specifically ejecting the ions not desired. This process is known as “isolation of the parent ions”, and the manufacturers of RF ion trap mass spectrometers provide appropriate methods for it, which can be carried out by the control software of the mass spectrometers. Usually it is not only the monoisotopic ions of the parent ions, but all the ions of an isotopic group which are isolated. After the isolation, the parent ions are fragmented to daughter ions; these are measured as a mass spectrum. As described above, this acquisition method for daughter ion spectra can be used not only for ions which become visible in the primary mass spectrum, but also “blindly” or “non-selectively” for those ions which do not stand out from the background, but are only assumed to be present. The background in HPLC-coupled ion traps originates from many sources: ions from solvent complexes, from impurities of the solvents, from “column bleed”, from impurities from the ion source or the inlet capillary, and from impurities from the mass spectrometer, which are ionized via a protonation by injected ions. These ions are usually singly charged; multiply charged ions are the exception here.
In view of the above a need exists to provide a method whereby the presence and/or the identity of digest peptides of an enzymatically digested complex mixture of proteins can be determined with detection limits which are much lower than those of presently known methods.