The invention concerns mass spectrometric analysis of known mutation sites in the genome, such as single nucleotide polymorphisms (SNPs).
Subject of this invention is a diagnostic method for the detection of actual mutative states in the genome DNA, whereby the possible mutation site has to be known before-hand. These mutative sequence changes, compared to the standardized sequence of a xe2x80x9cwild typexe2x80x9d, may either be a base exchange (xe2x80x9cpoint mutationxe2x80x9d) or the introduction of nucleotides (xe2x80x9cinsertionxe2x80x9d) or removal of nucleotides (xe2x80x9cdeletionxe2x80x9d). Point mutations with a frequency above one percent in a population have been named xe2x80x9csingle nucleotide polymorphismsxe2x80x9d; the abbreviation SNP has become particularly wide-spread in the recent literature. For humans, it is supposed that there are about 10 million SNPs which characterize most of the individually inherited differences between humans. They control the individual phenotypes. Roughly three million SNPs are estimated to be in the frequency range of 30 to 70 percent of the population. End of the year 2001, more than one and a quarter million SNPs were discovered and listed in the public data base NBCI of the worlwide acting SNP Consortium.
For the genome of a species, it is customary to define a xe2x80x9cwild typexe2x80x9d which is regarded as free of mutation, and a xe2x80x9cmutantxe2x80x9d which contains a mutation. Considering the frequency of mutations such as SNPs, and the equal value of mutants and wild types, the definition of the wild type is arbitrary or at least purely accidental, as already reflected in the term xe2x80x9cpolymorphismxe2x80x9d.
Nearly all DNA mutations, including all those defined above, produce differences in the mass of the DNA segment containing the mutation in comparison to the mass of a corresponding segment of the wild type. The precise mass determination of a DNA segment can therefore be used for the determination of a mutation. Exceptions of this rule are the relatively rare xe2x80x9crotationsxe2x80x9d, an interchange of two bases in a sequence.
Mass spectrometry is a very powerful and precise tool for determining the mass of a bio-molecule. By using a mass spectrometric method, such as time-of-flight mass spectrometry (TOF-MS) with ionization by matrix-assisted laser desorption and ionization (MALDI), it is possible to analyze the ions for their masses. However, ionization can also be achieved using electrospray ionization (ESI), in the latter case with mass spectrometers which are frequently of a different type.
With polymerase chain reactions (PCR), using a pair of xe2x80x9cselection primersxe2x80x9d, i.e. single strand oligonucleotides about 20 bases long, it is possible to produce amounts in the order of billions of double-strand PCR products with a length of at least 40 base pairs in a well-known way. The production process for these oligonucleotides increases the number of products exponentially by application of temperature cycles (xe2x80x9cthermocyclesxe2x80x9d); such processes have become known under the general term xe2x80x9camplificationxe2x80x9d. The mutation site can be incorporated in the products by adequately choosing the sequences of the two selection primers.
The obvious method to simply measure the mass of the PCR-amplified oligonucleotides as such by mass spectrometry, was found to be almost unworkable. The precise measurement of these DNA products with more than 40 base pairs proved itself to be almost impossible. The reasons for this are extremely low sensitivity for long DNA products because of difficult ionization, high probability of adduct formation with undefined numbers of sodium or potassium anions, and easy fragmentation of the fragile DNA products. These oligonucleotides have a poly-anionic character; each phosphate group of the DNA backbone forms an anion and has to be neutralized during ionization by a proton (which eagerly are replaced by alkali cations if present). A method therefore had to be found to provide as short oligonucleotides as possible, still containing the mutation site.
To this end, several methods of restricted, mutation-dependent primer extension using terminating derivatives of the nucleotide tri-phosphates have been developed in order to generate extended primers of approximately 12 to 25 nucleotides in length only, better suited to identify the nature of the mutation by mass spectrometry.
These methods basically consist of the following steps: Firstly, a sufficient number of copies of the DNA segment containing the mutation site is produced by PCR using a pair of selection primers. After extraction and washing, these DNA segments secondly serve as templates for the enzymatic, mutation-dependent extension of an xe2x80x9cextension primerxe2x80x9d by a second phase of thermocycling. In this second thermocycling phase, one to four of the nucleotide triphosphates are derivatized in such a manner that they serve as terminators for the extension, i.e., if the terminator is built in at the 3xe2x80x2 end, a prolongation is no longer possible because the binding site is occupied. The extension primer may be identical with one of the two selection primers; however it is regularly much better to use an extension primer which is not identical.
The extension primer is a short DNA chain of approximately 10 to 20 nucleotides and functions as a recognition sequence for the site of a possible mutation. The extension primer is synthesized with a base sequence so that it can be xe2x80x9chybridizedxe2x80x9d or xe2x80x9cannealedxe2x80x9d to the template strand, being an exact compliment to the base sequence in the vicinity of a known point mutation site. (The attachment of a complementary strand is known as xe2x80x9chybridizationxe2x80x9d or xe2x80x9cannealingxe2x80x9d).
Different types of primer extension procedures have been developed, generating either products with equal numbers of bases for mutants and wild types, differing only by the differences in weight of the different bases (9 to 40 atomic mass units as differences), or products with different numbers of bases (at least about 300 atomic mass units difference) for mutants and wild types. The latter are easier to measure by mass spectrometry, but somewhat more complicated to generate. In both cases, however, the PCR products of the first amplification cycle have to be cleaned from the nucleotide triphosphates and primers, new nucleotide triphosphates (including the terminating derivatives) and extension primers have to be added, and another set of copying thermocycles have to be applied. The final products, about 12 to 25 bases in length, again have to be thoroughly washed before mass spectrometric analysis. Primer extension procedures are complicated, using two different thermocycling and washing procedures subsequently, thus about doubling the effort of a pure PCR amplification.
The primer extension methods are widely covered by U.S. Pat. No. 6,258,538 ((H.Kxc3x6ster et al.).
All primer extension methods have to use rather expensive types of polymerases because not all polymerases can handle the terminating dNTP derivatives. The use of thermosequenase, especially developed for the Sanger method of sequencing, is highly recommended, more inexpensive polymerases do not correctly bind the terminators. Inexpensive polymerases, such as tac polymerase, can only be used in the first amplification by PCR.
Unfortunately, precise determination of the mass of even these relatively short primer extension oligonucleotides is still difficult. With a primer extension method delivering products with the same number of bases, the mass differences between wild type oligonucleotide and mutant oligonucleotide amount to 9 to 40 atomic mass units only. Because of the poly-anionic character of the DNA, various numbers of ubiquitous sodium (23 atomic mass units) or potassium ions (39 atomic mass units) are particularly likely to attach to the oligonucleotides (instead of protons), and so-called xe2x80x9cadductsxe2x80x9d are formed. The uncertainty in the degree to which the adducts are formed makes any precise mass determination exceptionally difficult-at the very least, it means that cleaning has to be extremely thorough to avoid the usually ubiquitous presence of any sodium or potassium cations and all relevant process parameters have to be carefully monitored for being kept constant.
Therefore, procedures have been searched for to shorten even more the relatively short primer extension products, including partial enzymatic digestion and chemical or enzymatic cleaving. These shortening procedures force to apply even more washing processes, even if the washing has not to be that thoroughful.
One of the methods to shorten the products which have to be analyzed mass spectrometrically was proposed by Monforte et al. (J. A. Monforte, C. H. Becker, T. A. Shaler, D. J. Pollart, WO 96/37630). The authors proposed the use of linkers which can be chemically or enzymatically cleaved. The necessary introduction of chemicals for the cleaving process, however, always has the disadvantage of introducing traces of impurities which again may form adducts. In addition, chemical cleavage needs adjustments of other parameters of the solution as for instance pH values, needing more chemicals to be added with the danger to introduce, e.g., alkali ions. Enzymatic cleaving, e.g. by restriction endonucleases, means a very restricted design of the primers which have to offer a recognition pattern for the nucleases and also needs adjusted buffer conditions for cleavage, making washing after cleavage a necessary step.
Another method of shortening the DNA products by partial digestion has been developed by Gut and Beck (WO 96/27681), together with a neutralization of the DNA products, generating more peptide-like products.
The MALDI preparation and measurement procedure consists of first embedding on a sample support the analyte molecules into small crystals of a solid UV-absorbent matrix, usually an organic acid. The sample support is introduced into the evacuated ion source of the mass spectrometer. The matrix is then evaporated by a short laser pulse of about 3 nanoseconds, producing a so-called plume consisting of a weakly ionized plasma which lasts for some tens of nanoseconds before it quickly expands into the surroundung vacuum. The evaporation process transports also the analyte molecules into the plasma plume. The analyte molecules are ionized as a result of collisions with matrix ions of the plume but, unfortunately, a condition-dependent and length-dependent percentage of the fragile DNA analyte molecules will be fragmented. The voltage which is applied to the ion source apertures accelerates the ions into the flight tube which has no electrical field. Due to their differing masses, the ions are accelerated to different velocities. The smaller ions reach the detector earlier than the larger ions. The flight times are measured and converted into ion masses.
MALDI is ideally suited for analyzing peptides and proteins. The analysis of nucleic acid chains is somewhat more difficult. Even in the case of short nucleic acid chains, ionization in the MALDI process is approximately 100 times less successful than it is for peptides; the sensitivity decreases superproportionally with increasing mass. The reason for this is that only a single proton has to be captured to ionize a peptide or a protein. For nucleic acids with multiple negative charges on the poly-anionic sugar-phosphate backbone (one negative charge for each nucleotide), the ionization process involving such a lot of protons is considerably less efficient. The DNA products which have to be detected must therefore be as short as possible so that they can be detected well.
In a similar way, an ionization method can also be used which uses a liquid with solved samples as the starting point. This is known as electrospray ionization (ESI). There are different types of mass spectrometers equipped with ESI ion sources, such as ion traps, FTMS, and time-of-flight mass spectrometers with orthogonal ion injection. The method is also ideally suited to the detection of peptides and proteins but has similar problems with oligonucleotides. Here also, the oligonucleotides which are to be detected have to be as short as possible.
The invention provides an easy procedure which produces sufficient amounts of ultrashort and ultraclean DNA products for mass spectrometric analysis; if any possible with only a single amplifying and a single washing process, thus reducing time, cost, and effort of sample preparation, compared to hitherto used methods of primer extension. The invention is based upon a single application of a cyclic enzymatic amplification process such as the polymerase chain reaction (PCR), however using in this process a mixture of primers without and with built-in photocleavable xe2x80x9clinkersxe2x80x9d with specified properties. The linker-containing primers cause the generation of short by-products during the amplification process which cannot be amplified further. After amplification, the short by-products are extracted, e.g. by affinity bonding to substrates, washed, and cleaved by UV light to produce even shorter analytical products, ready for mass spectrometric analysis. The use of xe2x80x9cblockersxe2x80x9d with specified properties in one type of the primers allows for even shorter analytical products.
Thus the procedure according to the invention consists of only one thermocy-cling and one washing process, followed by an easy, non-polluting cleavage procedure using a simple UV lamp delivering the final analytical DNA products for mass spectrometric analysis.
The photocleavable linkers have the following properties:
the linker can replace any nucleotide in a primer and maintains approximately the same distance between the neighboring nucleotides as the replaced nucleotide;
the linker does not hinder proper annealing of the primer to a complementary counter strand, whereby the primer can anneal to a complementary counter strand with an arbitrary nucleotide opposite the linker;
the linker does not hinder enzymatic elongation at the 3xe2x80x2 end by the polymerase copying process if the linker is a few nucleotides away from the 3xe2x80x2 end;
the linker stops the polymerase copying procedure if encountered in a template; and
the linker is cleavable by UV light, thereby cleaving the DNA sequence.
As photocleavable linkers with the above mentioned additional features, building blocks from the o-nitrobenzyl derivatives class of compounds are particularly suitable. After converting the o-nitrobenzyl derivatives into DNA building blocks or analogues, these can be built into the primer at any position, replacing a regular nucleotide. Such onitrobenzyl derivatives do not interfere with annealing and only slightly lower the optimum annealing temperature during a DNA polymerase reaction. They are accepted by various polymerases as non-interfering the elongation at the 3xe2x80x2 end if they are positioned a small number of nucleotides away from the 3xe2x80x2 end. The synthesis and mechanism of photocleavable 1-(2-nitrophenyl)ethyl esters of various different phosphates and thiophosphates have already been examined in detail by Walker et al. (J. Am. Chem. Soc. 1988, 110, 7170-7177) and Ordoukhanian and Taylor (J. Am. Chem. Soc. 1995, 117, 9570-9571) but no application to mass spectrometry has been mentioned. It should be well understood that these linkers are by no means derivatives of nucleotides by just introducing other groups instead of the usual bases. The linker does not hinder the elongation of the primer at the 3xe2x80x2 end by polymerases, whereby some polymerases require four nucleotides at the 3xe2x80x2 end, others can start the copying process reliably with only three nucleotides between linker and 3xe2x80x2 end. It is preferred to have the linker positioned as near to the 3xe2x80x2 end as possible.
The blockers, built-in alternatively in one type of analytical primers, are defined by the following properties:
the blocker can replace a nucleotide in a primer;
the blocker does not hinder the annealing of the primer to a complementary counter strand;
the blocker does not hinder enzymatic elongation at the 3xe2x80x2 end by the polymerase copying process even if the blocker holds the 3xe2x80x2 position; and
the blocker stops the polymerase copying procedure if encountered in a template.
As blockers, many different nucleotide derivatives can be used. There may be one blocker each for each of the four types of nucleotides; but this is not necessary. One of the easiest derivative usable as a blocker is the nucleoside thiophosphate which anneals properly, can be elongated by the polymerase, and stops the copying process if encountered in a template. It is favorable to use not just one nucleotide thiophosphate as a blocker, but two or three in a row to stop the polymerase copying process of a template reliably.
Other types of blockers are nucleotide derivatives where the base bonded to the sugar-phosphate backbone is replaced by a chemical group not correctly forming hydrogene bridges to the counter nucleotide, or not even forming hydrogen bridges at all. The blockers are preferredly positioned directly at the 3xe2x80x2 end of the primer. In cases where the polymerase has difficulties to start elongation, it is possible to use a single regular nucleotide in the position at the 3xe2x80x2 end, directly neighbored by the blocker nucleotide or nucleotides.
PCR amplification is thusly performed with a mixture of two pairs of primers: a first pair of xe2x80x9cselectionxe2x80x9d primers controlling the PCR process and a second pair of xe2x80x9canalyticalxe2x80x9d primers, whereby one of the analytical primers of the pair contains a linker, and the other analytical primer of the pair contains either a linker or a blocker. The two pairs of primers can be identical, except for the linker or blocker site, but preferredly the linker/blocker-containing analytical primer pair is xe2x80x9cnestedxe2x80x9d in the PCR products generated by the pair of selection primers. The linker-containing analytical primers can be biotinylated at their 5xe2x80x2 end for easy immobilization at a streptavidin-coated surface and washing. Of course any other affinity capture group can be used instead of the biotin, or a part of the sequence itself may be used for immobilization by hybridization.
Favorably, the primers contain the photocleavable linker about two to five nucleotides away from the 3xe2x80x2 position. If the second primer of the pair contains a blocker, the blocker should be positioned at the 3xe2x80x2 end, or at least in the position next to the 3xe2x80x2 end.
The PCR amplification with the mixture of the two pairs of primers ends up with a high number of linker-containing DNA by-products which are already shortened beyond the mutation site because one of the earlier copying processes already had found a linker or blocker in the template to be copied (see FIGS. 2 and 3). If the linker-containing primers are biotinylated, then the final products can be immobilized at a surface covered with streptavidin, washed, and cleaved. The whole process produces the expected short DNA products, intermixed with products considerably longer because they still contain, at their 3xe2x80x2 end, the complement of the full selection primer. These considerably longer DNA products may be washed away by size-specific adsorption, but this is not really necessary because they regularly do not disturb the MALDI or ESI analysis.
Using a pair of analytical primers, each of which contain linkers in the fifth position from the 3xe2x80x2 position, the length of the short product will add up from four bases of one analytical primer, from four bases complementary to the other analytical primer, and from the length of the sequence between the primers. With only the mutation site between the primers, the length of the short product will amount to exactly 9 bases. If the linker can be placed nearer to the 3xe2x80x2 end, the product can even be shorter. Samples from both strands are produced at the same time: the analytical result from one strand is corroborated by the analytical result of the other strand. If only one linker-containing analytical primer is biotinylated, only one strand is analysed.
Using a linker-containing primer and a blocker-containing primer as the analytical second pair of primers, the final product for mass spectrometric analysis is even still shorter: It may contain four bases from the linker-containing primer, plus the length of the strand between the analytical primers. With only the mutation site between the primers and with the blocker in the 3xe2x80x2 position, the total length is only five bases: a pentamer is produced.
The PCR yield for the short products and the amount of longer chains in the final products depends very much on the ratio of linker/blocker-containing to linker/blockerfree primers in the mixture. If the primers of the analytical pair of primers both contain only linkers, and if the annealing process of all primers has the same probability, then the following relations hold true: The highest yield of wanted short DNA samples for analysis, obtained with the lowest number of thermo cycles, is achieved with a mixture of roughly 7% linker-containing analytical primers. A 1.5-fold larger amount of longer PCR products is intermixed, but these oligonucleotide will not be seen in the MALDI analysis. The PCR products, generated by the selection primers, amount to an 16-fold surplus. The surplus of PCR products can be diminished by larger percentages of analytical primers, but the ratio of analytical primers to PCR selection primers turns out to be not very critical. A compromise is a mixture of 10 to 20% of linker-containing analytical primers, but easily acceptable are ratios somewhere in the range from 3 to 30 percent.
It is one the special advantages of this invention that the photolytic cleavage does not introduce any additional pollutions as is the case with all chemical or enzymatic cleavage methods.
Following the PCR process, the analytical by-products are immobilized, e.g. by streptavidin-coated surfaces if the products are biotinylated, for a thorough washing. After washing, the linkers of the still immobilzed products are cleaved with a UV lamp. The free cleavage products now consist of the wanted short oligonucleotides of about five to eleven bases in length for an analytical pair of primers with only linkers, or with four to six bases in length for an analytical pair of primers with linkers and blockers, both intermixed with an slightly higher amount of products which contain the full primer length beyond the mutation site.
In case of MALDI ionization, the immobilization can directly take place on the sample support plate if biotinylated primers with linkers are used and the sample locations are coated with streptavidin. Such sample support plates can be coated with a highly hydrophobic coating, leaving only hundreds of small hydrophilic anchors for sample preparation. The anchors are coated with streptavidin, and the PCR solution is simply pipetted from a well of the microtitre plate used for PCR to such a sample anchor. Due to the hydrophibicity of the plate surrounding the anchor, the samples of different wells keep separated on the plate. The final analytical biotinylated oligonucleotides are immobilzed on the anchors, and the plate with hundreds of samples is thoroughly washed. After cleaving and drying, the free cleavage products are taken up by a pipetted drop of solvent with matrix substance for the MALDI process. After a second drying process, the support plate is ready for MALDI analysis in a time-of-flight mass spectrometer.