The invention relates to the fast, inexpensive analysis of amino acid sequences of proteins with mass spectrometers that use ionization by matrix assisted laser desorption (MALDI). The standard method for protein sequencing is Edman degradation, which allows a total of between about 30 to 40 amino acids from the N-terminus to be read in suitable machines over a period of around 10 hours, using well-purified protein samples and relatively expensive chemicals. C-terminal amino acids cannot be determined. If the N-terminus is blocked, the method does not work. Since this method is no longer adequate to meet modern demands in terms of cost, speed of analysis and sequencing lengths, automatic Edman sequencers are no longer manufactured. At present, the search is on for methods that operate more quickly, economically and with greater sequencing lengths.
A device (conceivable, although not yet existent) for large-scale protein sequencing capable of a thousand sequence analyses of proteins or split segments with up to 150 amino acids or more per hour would today give a stimulus to many fields of application and open up many new ones. It would permit extensive research into the changes in many different protein types attributable to the evolution of species, and would enormously facilitate the taxonomic classification of species, which is at present carried out on the basis of slow and expensive DNA analyses. In particular, however, it would allow examination of the variations of proteins in the individuals of a species. Our genotype includes hundreds of thousands of SNPs (single nucleotide polymorphisms) that distinguish one person from another. It can be expected that a significant proportion of these polymorphisms are also reflected in variations of the proteins that, in turn, manifest in our phenotype. Genetically conditioned functional changes in many proteins (reduced function, hyperfunction, malfunctions) undoubtedly occur, which can produce altered appearance, altered behavior, altered tolerance of external and internal influences such as foodstuffs, chemicals, pharmaceuticals and many more effects. This could in turn lead to diagnostic methods for the discovery of many abnormalities, including genetically conditioned intolerances and predispositions to disease.
A method that shows high promise as a basis for such automatic sequencing machines and for corresponding assays is the MALDI analysis of protein molecules with randomly generated spontaneous fragmentation, which has become known by the abbreviation “ISD” (in-source decay).
MALDI (ionization by matrix assisted laser desorption) is an important type of ionization for biomolecules, which was developed about 20 years ago by M. Karas and K. Hillenkamp. MALDI ionizes the biomolecules, which are present at high dilution in a matrix substance in predominantly solid samples on sample supports, by firing laser light pulses at them. Each laser light pulse creates a tiny, short-lived cloud of hot plasma containing neutral molecules, and positive and negative ions from a sample.
The ions from the plasma created by each individual laser light pulse are still today preferentially accelerated, after a short delay of several hundred nanoseconds, axially into the flight path of a MALDI time-of-flight mass spectrometer (MALDI-TOF MS) specially designed for this purpose; after transiting the flight path, the ions are passed to a detector that measures the mass-dependent arrival time of the ions and their quantity, and saves the digitized measurements as a time-of-flight spectrum. The delayed extraction (DE) of the ions serves to increase mass resolution for the ions of the expanding plasma plume (see, for instance, A. Holle et al., U.S. Pat. No. 5,654,545 A). Repetition frequencies for the laser light pulses used to be between 20 and 200 hertz; today MALDI-TOF mass spectrometers are available with light pulse frequencies of up to two kilohertz. Nowadays, however, time-of-flight mass spectrometers with orthogonal ion injection (OTOF) are also increasingly being equipped with MALDI ion sources; these record mass spectra at repetition rates of about five kilohertz.
In both types of mass spectrometer, detectors for the ion beams are used that consist of a special secondary electron multiplier (SEM) followed by a transient recorder. The transient recorder contains an extremely fast analog-to-digital converter (ADC), working at between 2 and 4 gigahertz, however, with rather low intensity resolution of usually only 8 bit. The time-of-flight spectra may be up to 200 microseconds long, therefore comprising up to 800,000 measurements. The measurements from several hundreds or thousands of time-of-flight ion spectra acquired in sequence in this way are added to form a sum spectrum. This is subjected to a peak detection process, and the list of time-of-flight peaks is converted by means of a calibration curve into a list of the masses m per number z of elementary charges m/z and their intensities i. By using energy-focusing reflectors and other measures such as delayed extraction (DE) described above, the mass spectra from both types of mass spectrometer can achieve mass resolutions of R=m/Δm=20,000 to 50,000, where Δm is the half-height width of the ion peak for the mass m.
The term “mass spectrum” refers, a little ambiguously, either to the list of masses per charge m/z mentioned above with their intensities im/z, or to their graphical representation im/z=f(m/z) (“line spectrum”), or to the quasi-analog function of the measured values in=f(m/z), where n represents the numerator of the measurements in the time-of-flight spectrum.
Whenever the expression “acquisition of a mass spectrum” is used below, it usually means acquiring hundreds or thousands of individual spectra, combining them into a sum spectrum and converting this into a mass spectrum, as described above. This applies equally to mass spectra from molecular ions and to daughter ion spectra.
When the term “mass of the ions”, or simply “mass” in connection with ions, is used in the context of mass spectroscopy, it always means the ratio of the mass m to the number z of elementary charges, m/z; in other words, the physical mass m of the ions divided by the dimensionless, absolute number z of the positive or negative elementary charges carried by the ion. The rather unfortunate term “mass-to-charge ratio” is often used for m/z, even though it has the physical dimension of a mass. Since, however, MALDI delivers practically only singly charged ions (z=1), the distinction between “mass” and “mass-to-charge ratio” is in most cases irrelevant here. In this document, the “dalton” (Da) is used as the unit of mass, since this unit is generally used in biochemistry, rather than the statutory, non-coherent SI unit known as the “unified atomic mass unit” with the abbreviation “u” or “amu”.
For protein sequencing, it is necessary to record daughter ion spectra (fragment ion spectra) of the protein molecule ions. Two different methods for generating and measuring daughter ions from selected parent ions can be carried out with MALDI:
1) A method using “ergodic” (or “thermal”) fragmentation through the decomposition of metastable ions in the mass spectrometer after they have been accelerated in the ion source, a method which primarily creates b and y fragment ions (abbreviated PSD=post-source decomposition, i.e. decomposition after acceleration of the ions). If PSD is to be used in such a way that daughter ion spectra are acquired in a single process, this requires a special and, unfortunately, expensive MALDI-TOF-TOF mass spectrometer equipped with switchable post-acceleration units (see Köster et al., DE 198 56 014 C2; GB 2 344 454 B; U.S. Pat. No. 6,300,627 B1 in this respect).
2) A method of spontaneous, non-ergodic fragmentation of the molecular ions (abbreviated ISD=in-source decay; decomposition before acceleration of the ions in the ion source). The fragmentation of the ions takes place prior to their acceleration, which is delayed, for better mass resolution, by a few hundred nanoseconds (DE); this method primarily yields c and z fragment ions. ISD appears to be particularly suitable for sequence analysis, since it can, in principle, be carried out in simpler and more inexpensive mass spectrometers without post-acceleration units. Within minutes, very good and easily evaluated fragment ion spectra can be generated from protein samples of a purity and quantity approximately equivalent to that of proteins prepared for Edman sequencing, but without the time consumption and material costs of Edman sequencing. In the mass range between about one kilodalton and eight kilodaltons, the c fragment ions yield an outstanding and easily detectable sequence of signals in the mass spectrum, all of which consist roughly of about the same number of ions. The z fragment ions also yield a sequence of signals, each with about the same number of ions, but the mean signal value is lower than that of the c fragment ions by a factor of 5 or 10.
The intensity variations of both sequences of fragment ions lie within a relatively narrow band, extending only by a factor of 1.3 above and below the mean value. All the amino acids thus fragment with about the same probability. Proline is an exception; it has a unique annular structure and therefore, although it may split, nevertheless does not yield two separate fragments.
The various ion detectors all function on the basis of secondary electron multiplication, and their sensitivity therefore falls with the increasing mass. For this reason the mean values of the intensities of the c fragment ions in the mass spectrum also fall with increasing mass, and the possibility of analyzing the c fragment ions in current MALDI-TOF mass spectrometers finishes at a maximum of about 70 amino acids away from the N-terminal end. Starting from the C-terminal end, an evaluation of the z fragment ions can determine a sequence of at most about 50 amino acids.
Unfortunately, the mass spectra that are acquired in this way in currently available MALDI-TOF mass spectrometers still leave much to be desired. For instance, the lower mass range up to about m/z=1000 daltons is masked by such a strong chemical background that it is not possible to evaluate the mass spectra. The background originates to a large extent from molecules of the matrix substance. When these are smashed by laser light pulses with the currently usual pulse durations and energy densities, they come together in the hot, but rapidly adiabatically cooling plasma of the desorption cloud to form complex ions of widely varying masses, so generating an almost continuous chemical background. It is therefore not possible to read the sequence of the first eight to ten terminal amino acids.
A special method for also reading the terminal sequences through metastable decay of a selected type of ISD fragment ion, as well as for a more detailed structural analysis of ISD fragment ions, consists in exploiting the instability of these fragment ions and measuring the granddaughter ions created by metastable decay, using a MALDI-TOF-TOF mass spectrometer equipped for recording ergodically generated fragment ions (D. Suckau and A. Resemann: DE 103 01 522 A1; GB 2 399 218 B; U.S. Pat. No. 7,396,686 B2). The method is, however, disadvantageous for large-scale protein sequencing because, after a first spectral evaluation, at least two further granddaughter ion spectra of c and z ISD fragment ions have to be acquired. In addition, an expensive MALDI-TOF-TOF mass spectrometer with a post-acceleration unit is needed in order to also acquire spectra of the fragment ions created by metastable decay.
In MALDI mass spectrometry, considerable skill is required in order to set the detector amplification and the MALDI conditions so as to optimally exploit the 8-bit range of the analog-to-digital converter (ADC) in the transient recorder, without either exceeding its dynamic measurement range (the “measurement window”) of only 1:255 counts through oversaturation or failing to detect each of the ions that has been created as a result of a signal that is too weak. The impact of the ions on the secondary electron multiplier (SEM) only generates a small quantity of between zero and about six electrons; the numbers accord with a Poisson distribution. The Poisson distribution is characterized by having a standard deviation equal to its mean value; this means that there are always some ion impacts that do not generate secondary electrons (null events), and the number of these becomes larger as the mean value becomes smaller.
According to the prior art, the amplification of the SEM in a MALDI time-of-flight mass spectrometer is considered to be optimal if a single ion with a mass of about m/z=1000 daltons and an energy of around 30 kilo-electronvolts generates, on average, a signal of about 2.5 counts of the ADC in the transient recorder; the measuring range for ions in the measuring period of 0.5 or 0.25 nanoseconds is then 1:100, and the loss of signals from individual ions is negligibly small, at least in the mass range normally measured, which extends up to m/z=3000 daltons. As the ion signal usually extends over several measuring periods, there must not be more than a few hundred ions in an ion signal containing ions of the same mass if oversaturation is to be avoided. Adjusting the quantity of ions in this way has, however, the effect that the ions are no longer all detected in the higher mass range above three kilodaltons because more and more null events occur as the mass increases. The sensitivity of any SEM, i.e. the number of electrons generated in the maximum of the Poisson distribution, decreases with mass m at least by 1/√m; for this reason sequencing is at present limited to a maximum of 70 amino acids at the C-terminus (about eight kilodaltons) and 50 amino acids at the N-terminus (around six kilodaltons). Optimal adjustment of the MALDI conditions calls for a great deal of knowledge about the effect of the laser light parameters on the MALDI processes.
Matrix assisted laser desorption uses (with a few exceptions) solid sample preparations on a sample support. The samples essentially consist of small crystals of the matrix substance mixed with a small proportion (only about one hundredth of one percent) of molecules of the analyte substances. The analyte molecules are individually incorporated in the crystal lattice of the matrix crystals, or are located at the boundaries between the crystals. The samples prepared in this way are exposed to short UV laser light pulses. The duration of the pulse is usually a few nanoseconds, and depends on the laser being used. This creates a vaporization plasma containing neutral molecules and also ions of the matrix substance along with a few analyte ions.
The nitrogen lasers normally used in the past are not suitable for high throughputs, since they only have a life time of a few million laser pulses. They are nowadays increasingly being replaced by solid-state lasers, whose life time is more than a thousand times greater. Solid-state lasers deliver a smooth energy density profile right across the laser spot provided by the lens system. The energy density profile approximately follows a Gaussian distribution.
The introduction of solid-state lasers into MALDI technology in place of the nitrogen lasers previously used led to the surprising discovery that the smooth beam profile from these solid-state lasers actually reduced the yield of ions. The profile of the beam from a nitrogen laser consists of micro-spots, whose position varies from one laser pulse to the next. For this reason, a method for profiling the laser beam from solid-state lasers to create a number of individual spots of optimum diameter was developed, and this increased the yield of ions even above the yield obtained from nitrogen lasers. This technology has become known under the name “Smart Beam”, and is described in detail in DE 10 2004 044 196 A1; GB 2 421 352 A; U.S. Pat. No. 7,235,781 C1 (A. Haase et al.). This technology makes it possible to achieve an increase in the ion yield by optimizing the diameter and number of the laser spots. A favorable embodiment shows 4 to 30 spots with less than 10 micrometers in diameter each. It therefore provides a method where, by profiling the laser beam, a high yield of analyte ions can be achieved at the same time as optimal adaptation to the measuring window of the transient recorder.
Not every matrix substance can be used for non-ergodic ISD fragmentation. The matrix substance α-cyano-4-hydroxycinnamic acid (CHCA), which is extremely suitable for analyzing peptides by ergodic PSD fragmentation, yields hardly any ISD fragment ions. Until now, dihydroxybenzoic acid (DHB) has mainly been used as the matrix substance for ISD. It has recently become known that the yield of ISD fragment ions can be significantly increased through the use of suitable matrix substances that easily donate hydrogen radicals (K. Demeure et al.: “Rational Selection of the Optimum MALDI Matrix for Top-Down-Proteomics by In-Source Decay”, Anal. Chem. A; Web Oct. 17, 2007). One such matrix substance is 1,5-diaminonaphthaline (1,5-DAN), but it can be expected that matrix substances that work even better will soon be available. These discoveries indicate that spontaneous, non-ergodic ISD fragmentation is primarily initiated by chemical reactions. These new matrix substances that readily donate hydrogen radicals are, moreover, able to open up the disulfide bridges in large proteins that lead to ring structures. Until now, the ring structures have prevented sequence decoding by ISD beyond the disulfide bridge, since fragmentation here results in split segments that still cohere across the ring structure. Opening of the disulfide bridges is caused in the basic environment of the plasma by the amino groups of the 1,5-DAN and by the donated hydrogen radicals.
Although the ISD method has been known for some 10 years, only recently has the progress been made that allows ISD to be applied easily. This progress is attributable, at least in part, to recent discoveries about the MALDI processes, but a great deal of research work is still required before the phenomena will be fully understood.