The mass spectrometric determination of the terminal sequence patterns of larger proteins, including possible modifications to these sequences, is usually difficult. Even non-mass spectrometric methods, for example Edman sequencing, which usually makes it possible to at least determine the N-terminal sequence, fail in the case of N-terminal modifications.
In mass spectrometers with MALDI ion sources it is not possible to achieve a uniform fragmentation of larger proteins right up to the termini. While it is possible to directly ionize larger proteins with simultaneous fragmentation with a so-called in-source decay (ISD), it is not possible to identify the terminal sequence patterns because of extremely noisy spectra in the lower mass range.
The use of electrospray ionization, suitable for large molecules, is usually coupled with the use of ion trap mass spectrometers or quadrupole filter mass spectrometers. Although in such cases the ions of the proteins can be fragmented (with difficulty), the small ions of the terminal fragments cannot be measured because of the lack of storage capability of these spectrometers in the lower mass-to-charge range.
With a preceding enzymic digest of the larger proteins, as used for the identification of proteins by means of ‘fingerprint spectra’, the terminal digest peptides can, in principal, no longer be identified as such. One solution is the biochemical labeling of the N or C termini in such a way that they can be identified as terminal in the mass spectra. This method also fails, however, when the enzymic splitting occurs relatively close to one of the termini; the digest peptide which is produced is then too short to be detected using mass spectrometric methods; different types of mass spectrometers have very different reasons why this is the case.
The determination of the terminal sequences is, however, important in various areas of application. One example is the examination of the terminal sequences for the quality control of synthesized proteins. The frequently used method of recombinant syntheses using introduced genes in production bacteria (usually for E. coli) experiences problems with the clean splitting off of the detector sequences (for example His tags) or with the detection of the stop codons by the bacteria's own polymerases. Chemical synthesis also experiences problems with undesired modifications to the protein ends.
To investigate sequences of biopolymers, tandem mass spectrometers are usually employed. These originally consisted of two spatially separated mass spectrometers with a collision cell to fragment the ions placed between them. In the first spectrometer, ions of a particular type were selected, and these were then at least partially fragmented in the collision cell. The ‘fragment ions’ or ‘daughter ions’ thus generated were then analyzed in the second spectrometer; the result is a ‘daughter ion spectrum’. One example is the ‘triple quad mass spectrometer’, which has an additional quadrupole as a collision cell between two quadrupole filter mass spectrometers. In addition to this principal of ‘tandem in space’, the principal of ‘tandem in time’ has recently appeared: In a storage mass spectrometer, ion selection, fragmentation and the scanning of the daughter ion spectra are carried out consecutively in the same storage cell. High frequency ion trap mass spectrometers or Fourier transform mass spectrometers (FTMS) are used as the storage mass spectrometers. The measuring methods for scanning the daughter ion spectra are often known as MS/MS methods for short.
Tandem time-of-flight mass spectrometers, among others, are used to acquire daughter ion spectra with ionization by means of matrix-assisted laser desorption (MALDI). They consist of a first spectrometer with an ion selector to select the ions which are to be investigated and the daughter ions formed from them, and a second spectrometer for analyzing the daughter ions. Tandem time-of-flight mass spectrometers for this measurement method with ionization by matrix-assisted laser desorption (MALDI) are available commercially and are often known as TOF/TOF mass spectrometers. Ionization using MALDI is favorable in that, in the main, only singly charged molecular ions of the analyte substances are formed. This makes it possible to mix different substances in one sample, for example the different tryptic digest peptides of a protein, and to then investigate the ions of these substances one after the other to look at the daughter ions they produce.
For ease of expression, the ions of the ion type primarily formed, whose structure is to be examined using fragmentation, are termed ‘selection ions’. This nomenclature is chosen because the ions of this type have to be selected from the mixture of primary ions in some way, regardless of whether this selection occurs before or after subsequent fragmentation. This selection usually occurs with the first mass spectrometer of a tandem mass spectrometer. Out of these selection ions, different decay processes now create ‘fragment ions of a first fragmentation generation’ and neutral fragments which are invisible to a mass spectrometer. These fragment ions of the first fragmentation generation are here termed simply ‘fragment ions’ or ‘daughter ions’. After a decay, the selection ion ceases to exist, of course. Usually, however, there are always sufficient selection ions remaining in an undecayed state so that their signal can also be seen in the daughter ion spectrum.
The daughter ions can be broken down further by different processes to form ‘fragment ions of the second generation’, generally termed ‘granddaughter’ ions here. In the granddaughter ion spectrum, daughter ions can usually still be seen since even the second fragmentation does not usually affect all the daughter ions.
As is widely known, two different fragmentation methods are available in TOF/TOF devices for the fragmentation of the selection ions into daughter ions: collisional fragmentation in a collision gas in a collision cell (CID=Collisionally Induced Decomposition) and metastable decay of ions as a result of increased energy absorption in the laser generated plasma within the MALDI ion source (LID=Laser Induced Decomposition). Both types of decay occur after the ions, including the selection ions, have been accelerated in the field-free flight path of the first time-of-flight mass spectrometer. In both cases, the daughter ions of the selection ions generated in each case therefore fly at the same velocity as the selection ions which have not decayed. After a velocity dispersive flight path, it is therefore possible to use an ion selector to time select the daughter ions, together with the non-decayed selection ions, from all the ions generated in the ion source, for subsequent analysis in the second spectrometer. The precise procedure for this selection and the subsequent analysis of the daughter ions after an intermediate acceleration before the second time-of-flight mass spectrometer will not be discussed further here. The basic principle of a TOF/TOF mass spectrometer is described in the patent U.S. Pat. No. 6,300,627 (corresponding to DE 198 56 014 C1).
The non-spontaneous metastable decay (LID) of the ions is preceded by an internal thermalization, i.e. a statistical equipartitioning of the excess energy absorbed in the laser plasma over all the oscillation systems of the ion. The bonds in the chain-like ion resemble a complicated system of coupled oscillations in which the available energy is continuously statistically redistributed by the coupled oscillation processes. If, at a relatively weak bonding point in the chain of the molecule, the momentarily accumulated energy exceeds the bonding energy at this point, the chain may be broken at this point. This creates primarily daughter ions of the b and y fragmentation series; in addition there are frequently also ions of the (b-17) series.
The nomenclature used here is based on that of Roepsdorf and Fohlmann as revised by Johnson, Martin, and Biemann in 1988 (Int. J. Mass Spectrom. Ion Proc. 86, 137-154). The basic fragmentation series a, b, c, x, y and z and their indices are shown schematically in FIG. 1. If a singly charged protein ion is divided at the bonding point between the amino group of the preceding amino acid and the carboxyl group of the subsequent amino acid, we refer to fragment ions of the b series when it involves N-terminal ions, and of ions of the y series when it involves C-terminal ions. To distinguish between the fragmentation series b and y it is important to know on which of the two resulting fragments the ionizing proton remains. In each case, the other end becomes the neutral fragment. (In the case of fragmentation of doubly charged protein ions, both N- and C-terminal fragment ions can be formed; this is scarcely possible in the case of the singly charged MALDI ions which form the vast majority). Indices on the letters b or y indicate which fragmentation point in the ion has been split; for b ions, the count begins at the N-terminus, for y ions at the C-terminus.—If the splitting takes place one carbon atom further towards the N-terminal end, we speak of ions of the a or x series, with an analog count for the indices. If it takes place further towards the C-terminal (at the other side of the nitrogen atom), we obtain ions of the c or z series. In addition there are frequently still ions where, for example, NH3, and occasionally also H2O, is split off; they are then termed ions of the (b-17) series (or the (b-18) series) with indices which are appended to the parentheses. (Sometimes, the parentheses are also omitted in spectra in the interests of a short representation).
Since, in each case, the weakest bonds (those bonds which lead to b or y series) between the amino acids of a peptide do not all possess the same bonding energy but instead have very different energies, certain bonds are broken less frequently than others. As an example, the bonds of proline to proline are stronger than the average bond between amino acids; these bonds are therefore by far less frequently fragmented and the resulting fragment ions occur less frequently. The corresponding mass signals in the spectrum (peaks) are therefore much weaker (if at all visible) than those of other fragment ions.
Collisionally induced decay (CID) is distinguished from metastable decay by the fact that ions additionally appear which are generated by side chain fragmentations (so-called d or w ions). Ions can also suffer double fragmentations, so that non-terminal (‘internal’) fragment ions can arise, which are then termed bnym.
The daughter ion spectra of digest peptides generated in TOF/TOF mass spectrometers are generally used for the confirmation of protein identifications which have been initially obtained from so-called ‘fingerprint spectra’ of the enzymic digest peptides of the protein. The confirmation of these identifications by means of the daughter ion spectra of individual digest peptides is carried out with the help of so-called search programs which work in databases with hundreds of thousands of stored protein sequences. Various search programs of this type are commercially available.
Furthermore, the daughter ion spectra are used for the de-novo sequence determination of proteins whose sequences are not contained in the database. However, these sequence determinations, for which commercial computer programs are also available, are difficult and usually not unequivocal, so that one usually receives a number of suggestions. The daughter ion spectra with their mix of b, y, internal and (b-17) ions are very complex, hence the sequence determination is often not unequivocal and frequently also only successful for partial sequences. For this reason, methods are urgently required which permit a simpler and clearer cut de-novo sequence determination.
A third type of ion fragmentation in MALDI mass spectrometers has been known for a long time, although until now it has rarely been used and for reasons not yet established it has not functioned with the same level of success in all commercial MALDI ion sources: in-source decay (ISD), which is generated simply by a higher laser energy density in the MALDI process (see, for example, D. C.Reiber et.al., “Unknown Peptide Sequencing Using Matrix-Assisted Laser Desorption/Ionization and In-Source Decay”, Anal. Chem. 1998, 70, 1214-1222). A fundamental difference between this method of daughter ion generation and the other two fragmentation methods is that, in this case, the fragmentation occurs spontaneously (within 10−8 seconds at the most) before the acceleration of the ions in the ion source. The delayed acceleration of the ions (DE=delayed extraction) nowadays used without exception for MALDI ion sources produces a clear separation of the decay period and the acceleration phase; the spontaneously decaying ions can thus be cleanly detected since these spontaneous decay processes are more or less complete when, after a few 10−8 seconds, the acceleration sets in. After leaving the ion source, the fragment ions thus have different velocities depending on their mass. These types of fragment ions can therefore be separated and analyzed in a simple time-of-flight mass spectrometer. This fragmentation functions particularly well for intact proteins in a molecular weight range of 2000 to about 70,000 atomic mass units.
For the application of in-source fragmentation it is, however, necessary that the analyte substances of the ion source are introduced separately and are moderately pure, since otherwise the spectra which are produced are so complex that it is no longer possible to interpret them. For this method, therefore, when the sample is prepared on a sample support, only one single analyte substance mixed with the matrix substance is applied. The application of a digest mixture is no longer required; the great advantage of this ionization and fragmentation is precisely that it can be used for large peptides up to large proteins.
In-source fragmentation is not an MS/MS method in the real sense because the first mass spectrometer, which selects the ions to be investigated from the complex ion mix, is missing. This selection of the ‘selection ions’ is instead performed (before any ionzation) by an external cleaning process for the investigated substance, for example chromatographic cleaning. The use of a synthesized substance, for example a recombinant protein, is also possible at this point. The application of a single pure substance is particularly necessary for more complex substances, such as proteins. If several proteins, whose ions decay into a very large number of daughter ions, were applied in comparable concentrations, the resulting spectra would be so complex that it would no longer be possible to decode them. This method is therefore only suitable for determining the structure of relatively pure substances.
Even though one cannot speak of an MS/MS method, it is without doubt the case that, as in an MS/MS method, fragment or daughter ions of the substance under investigation are measured. The method is sometimes called a ‘pseudo-MS/MS’ method.
The daughter ion spectra obtained by in-source decay (ISD) are very different in appearance to the daughter ion spectra obtained by CID or LID. The type of fragmentation of the in-source decay is, in all probability, a so-called electron capture dissociation (ECD), and it is indeed possible to observe that the fragment ion spectra of the ISD, with their strong preference for the c fragment ion series, have a high degree of similarity to the daughter ion spectra obtained in suitable mass spectrometers by means of ECD.
The spontaneous fragmentations which occur in the explosion plasma of the laser bombardment when the laser energy density is slightly increased probably occur in those ions which were initially doubly charged in the hot laser plasma as a result of double protonation, as is the case with ECD. If these ions are neutralized by one charge state by the electrons present at the same time, then the ionization energy (more accurately: the proton affinity energy) is released and transformed into oscillation energy. The energy transferred to the ion at a single point is so high that it immediately (in less than 10−8 seconds) causes the chain-shaped molecule in the immediate vicinity of the recombination point to break. One of the halves of the molecule carries the remaining charge and is thus an ion which can be analyzed, while the other half becomes a neutral particle which eludes further mass spectrometric analysis.
Protein spectra which arise through ISD primarily contain daughter ions of the c series for N-terminal ions, which are present in noticeably high ion signals, and the y series for C-terminal ions. As a result of the ring structure of proline, however, the C ions from the fragmentations in front of the prolines are completely absent because they would have to break open a double bond. The c ions do not end in COH at the C-terminus—as the b ions do—but instead they end in an amide structure (CONH2). For smaller ions, a ions also occur. The fragmentation is strongly matrix and size dependent, however. When α-cyano-4-hydroxy-cinnamic acid (CHCA) is used as matrix, there are significantly more daughter ions of the a series; their intensity is much lower when 2,5-dihydroxy-benzoic acid (DHB) is used, and when 3,5-dimethoxy-4-hydroxy-sinapic acid (sinapic acid) is used as matrix, only very few are found.
As a result of the localization probability of the neutralized proton, which is distributed roughly equally over the length of the protein, all the bonds between the amino acids are equally affected by the fragmentations, one exception being the fragmentations toward the prolines. All fragment ions of different lengths are therefore formed in roughly the same concentrations; this is completely different to the situation with CID and LID. Here, as well, a similarity to ECD spectra can be seen.
In-source fragmentation, however, has one disadvantage which cannot be ignored: The spectra in the range of the light ions up to a mass of 1000 atomic mass units, approximately, which are important for the analysis of protein ends, for example, are very strongly contaminated and noisy because of numerous fragments of the matrix substance and their oligomers and also, possibly, because of other small ions which have arisen as a result of reactions of many different types in the hot MALDI plasma. A meaningful interpretation of the spectra is only possible above around 1000 mass units; to avoid overloading the ion detector with an excess of small ions, ISD spectra are only ever recorded above a mass unit of 1000. For proteins, this means that the sequence of the first eight amino acids is unidentifiable. It is precisely this terminal sequence, which is of such interest for many analytical purposes, which evades analysis.
It is, however, advantageous that the spectra of the ISD fragment ions extend uniformly up to the higher mass ranges at around 5000 mass units, when larger proteins are measured. There is one exception, when the protein has a cross link, for example a disulfide bridging bond between two cysteines. At the cross link, the respective b or y ISD fragment ion series abruptly breaks off because here, two bonds each would have to be broken.
The fragment ions created by in-source decay are highly excited and therefore strongly metastable again in themselves, and decay to a large extent on the flight through the mass spectrometer. In line with current thinking, they are therefore measured only in linear mass spectrometers (without reflector or without using the reflector) since, in this case, the fragment ions and the granddaughter ions arising from some of these as a result of decay arrive at the same time and provide strong signals, even though the mass resolving power is not satisfactory for a good mass determination. Since the decays also always convert any bonding energy which is released into kinetic energy, i.e. some ions accelerate and others decelerate, the ion signal broadens and thus worsens the mass resolving power. According to current opinion, a measurement in a reflector time-of-flight would lead to spectra which are even noisier than those measured in linear spectrometers.
Contrary to current teaching, the use of a reflector, on the other hand, leads to good spectra with a much improved mass resolution, whereby the noise in the mass range between 1000 and 5000 atomic mass units, which is actually quite strong in the linear mode, is not noticeable. In this mass range, good resolutions of the isotope structure are consistently achieved, with a corresponding accuracy of the mass determination.
Its uniform fragmentation and good mass accuracy for the fragments would thus make this fragmentation method eminently suitable for determining sequences of larger proteins. It fails, however, because it is precisely the terminal sequences of proteins which cannot be detected because of the noisy spectrum. The objective of the invention, therefore, is the generation of biopolymer spectra which contain information about the terminal sequences of the building blocks of biopolymers.