Approximately 4,000 human disorders are attributed to genetic causes. Hundreds of genes responsible for various disorders have been mapped, and sequence information is being accumulated rapidly. A principal goal of the Human Genome Project is to find all genes associated with each disorder. The definitive diagnostic test for any specific genetic disease (or predisposition to disease) will be the identification of polymorphic variations in DNA sequence in affected cells that result in alterations of gene function. Furthermore, response to specific medications may depend on the presence of polymorphisms. Developing DNA (or RNA) screening as a practical tool for medical diagnostics requires a method that is inexpensive, accurate, expeditious, and robust.
Genetic polymorphisms and mutations can manifest themselves in several forms, such as point polymorphisms or point mutations where a single base is changed to one of the three other bases, deletions where one or more bases are removed from a nucleic acid sequence and the bases flanking the deleted sequence are directly linked to each other, insertions where new bases are inserted at a particular point in a nucleic acid sequence adding additional length to the overall sequence, and expansions and reductions of repeating sequence motifs. Large insertions and deletions, often the result of chromosomal recombination and rearrangement events, can lead to partial or complete loss of a gene. Of these forms of polymorphism, in general the most difficult type of change to screen for and detect is the point polymorphism because it represents the smallest degree of molecular change. Wild type is a standard or reference nucleotide sequence to which variations are compared. As defined, any variation from wild type is considered a polymorphism including naturally occurring sequence variations and pathogenic mutations.
Although a number of genetic defects can be linked to a specific single point mutation within a gene, e.g. sickle cell anemia, many are caused by a wide spectrum of different mutations throughout the gene. A typical gene that might be screened using the methods described here could be anywhere from 1,000 to 100,000 bases in length, though smaller and larger genes do exist. Of that amount of DNA, only a fraction of the base pairs actually encode the protein. These discontinuous protein coding regions are called exons and the remainder of the gene is referred to as introns. Of these two types of regions, exons often contain the most important sequences to be screened. Several complex procedures have been developed for scanning genes in order to detect polymorphisms, which are applicable to both exons and introns.
In terms of current use, most of the methods to scan or screen genes employ slab or capillary gel electrophoresis for the separation and detection step in the assays. Gel electrophoresis of nucleic acids primarily provides relative size information based on mobility through the gel matrix. If calibration standards are employed, gel electrophoresis can be used to measure absolute and relative molecular weights of large biomolecules with some moderate degree of accuracy; even then typically the accuracy is only 5% to 10%. Also the molecular weight resolution is limited. In cases where two DNA fragments with identical number of base pairs can be separated, using high concentration polyacrylamide gels, it is still not possible to identify which band on a gel corresponds to which DNA fragment without performing secondary labeling experiments. Gel electrophoresis techniques can only determine size and cannot provide any information about changes in base composition or sequence without performing more complex sequencing reactions. Gel-based techniques, for the most part, are dependent on labeling or staining methods to visualize and discriminate between different nucleic acid fragments.
All of the methods in use today capable of screening broadly for genetic polymorphisms suffer from technical complication and are labor and time intensive. Single strand conformational polymorphism (SSCP) (Orita et al., xe2x80x9cDetection of Polymorphisms of Human DNA by Gel Electrophoresis as Single-Stranded Conformation Polymorphisms,xe2x80x9d Proc. Natl. Acad. Sci. USA 86, 2766 (1989)), denaturing gradient gel electrophoresis (DGGE) (Abrams et al., xe2x80x9cComprehensive Detection of Single Base Changes in Human Genomic DNA Using Denaturing Gradient Gel Electrophoresis and a GC Clamp,xe2x80x9d Genomics 7, 463 (1990)), chemical cleavage at mismatch (CCM) (J. A. Saleeba and R. G. H. Cotton, xe2x80x9cChemical Cleavage of Mismatch to Detect Mutations,xe2x80x9d Methods in Enzymology 217, 286 (1993)), enzymatic mismatch cleavage (EMC) (R. Youil et al., xe2x80x9cScreening for Mutations by Enzyme Mismatch Cleavage with T4 Endonuclease VII,xe2x80x9d Proc. Natl. Acad. Sci. USA 92, 87 (1995)), and xe2x80x9ccleavasexe2x80x9d fragment length polymorphism (CFLP) procedures are currently gel-based, making them cumbersome to automate and perform efficiently. There is a need for new methods that can provide cost effective and expeditious means for screening genetic material in an effort to reduce medical expenses. The inventions described here address these issues by developing novel, tailor-made processes that focus on the use of mass spectrometry as a genetic analysis tool. Mass spectrometry requires minute samples, provides extremely detailed information about the molecules being analyzed including high mass accuracy, and is easily automated.
The late 1980""s saw the rise of two new mass spectrometric techniques for successfully measuring the masses of intact very large biomolecules, namely, matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry (TOF MS) (K. Tanaka et al., xe2x80x9cProtein and Polymer Analyses up to m/z 100,000 by Laser Ionization Time-of-flight Mass Spectrometry,xe2x80x9d Rapid Commun. Mass Spectrom. 2, 151-153 (1988); B. Spengler et al., xe2x80x9cLaser Mass Analysis in Biology,xe2x80x9d Ber. Bunsenges. Phys. Chem. 93, 396-402 (1989)) and electrospray ionization (ESI) combined with a variety of mass analyzers (J. B. Fenn et al., Science 246, 64-71 (1989)). Both of these two methods are suitable for genetic screening tests. The MALDI mass spectrometric technique can also be used with methods other than time-of-flight, for example, magnetic sector, Fourier-transform ion cyclotron resonance, quadrupole, and quadrupole trap. One of the advances in MALDI analysis of polynucleotides was the discovery of 3-hydroxypicolinic acid as a matrix for mixed-base oligonucleotides. Wu, et al., Rapid Comm""ns in Mass Spectrometry, 7:142-146 (1993).
MALDI-TOF MS involves laser pulses focused on a small sample plate comprising analyte molecules (nucleic acids) embedded in either a solid or liquid matrix comprising a small, highly absorbing compound. The laser pulses transfer energy to the matrix causing a microscopic ablation and concomitant ionization of the analyte molecules, producing a gaseous plume of intact, charged nucleic acids in single-stranded form. If double-stranded nucleic acids are analyzed, the MALDI-TOF MS typically results in mostly denatured single-strand detection. The ions generated by the laser pulses are accelerated to a fixed kinetic energy by a strong electric field and then pass through an electric field-free region in vacuum in which the ions travel with a velocity corresponding to their respective mass-to-charge ratios (m/z). The smaller m/z ions will travel through the vacuum region faster than the larger m/z ions thereby causing a separation. At the end of the electric field-free region, the ions collide with a detector that generates a signal as each set of ions of a particular mass-to-charge ratio strikes the detector. Usually for a given assay, 10 to 100 mass spectra resulting from individual laser pulses are summed together to make a single composite mass spectrum with an improved signal-to-noise ratio.
The mass of an ion (such as a charged nucleic acid) is measured by using its velocity to determine the mass-to-charge ratio by time-of-flight analysis. In other words, the mass of the molecule directly correlates with the time it takes to travel from the sample plate to the detector. The entire process takes only microseconds. In an automated apparatus, tens to hundreds of samples can be analyzed per minute. In addition to speed, MALDI-TOF MS has one of the largest mass ranges for mass spectrometric devices. The current mass range for MALDI-TOF MS is from 1 to 1,000,000 Daltons (Da) (measured recently for a protein). R. W. Nelson et al., xe2x80x9cDetection of Human IgM at m/zxcx9c1 MDa,xe2x80x9d Rapid Commun. Mass Spectrom. 9, 625 (1995).
The performance of a mass spectrometer is measured by its sensitivity, mass resolution and mass accuracy. Sensitivity is measured by the amount of material needed; it is generally desirable and possible with mass spectrometry to work with sample amounts in the femtomole and low picomole range. Mass resolution, m/xcex94m, is the measure of an instrument""s ability to produce separate signals from ions of similar mass. Mass resolution is defined as the mass, m, of a ion signal divided by the full width of the signal, xcex94m, usually measured between points of half-maximum intensity. Mass accuracy is the measure of error in designating a mass to an ion signal. The mass accuracy is defined as the ratio of the mass assignment error divided by the mass of the ion and can be represented as a percentage.
To be able to detect any point polymorphism directly by MALDI-TOF mass spectrometry, one would need to resolve and accurately measure the masses of nucleic acids in which a single base change has occurred (in comparison to the wild type nucleic acid). A single base change can be a mass difference of as little as 9 Da. This value represents the difference between the two bases with the closest mass values, A and T (A=2xe2x80x2-deoxyadenosine-5xe2x80x2-phosphate=313.19 Da; T=2xe2x80x2-deoxythymidine-5xe2x80x2-phosphate=304.20 Da; G=2xe2x80x2-deoxyguanosine-5xe2x80x2-phosphate=329.21 Da; and C=2xe2x80x2-deoxycytidine-5xe2x80x2-phosphate=289.19 Da). If during the mutation process, a single A changes to T or a single T to A, the mutant nucleic acid containing the base transversion will either decrease or increase by 9 in total mass as compared to the wild type nucleic acid. For mass spectrometry to directly detect these transversions, it must therefore be able to detect a minimum mass change, xcex94m, of approximately 9 Da.
For example, in order to fully resolve (which may not be necessary) a point-mutated (A to T or T to A) heterozygote 50-base single-stranded DNA fragment having a mass, m, ofxcx9c15,000 Da from its corresponding wild type nucleic acid, the required mass resolution is m/xcex94m=15,000/9≈1,700. However, the mass accuracy needs to be significantly better than 9 Da to increase quality assurance and to prevent ambiguities where the measured mass value is near the half-way point between the two theoretical masses. For an analyte of 15,000 Da, in practice the mass accuracy needs to be xcex94mxcx9cxc2x13 Da=6 Da. In this case, the absolute mass accuracy required is (6/15,000)*100=0.04%. Often a distinguishing level of mass accuracy relative to another known peak in the spectrum is sufficient to resolve ambiguities. For example, if there is a known mass peak 1000 Da from the mass peak in question, the relative position of the unknown to the known peak may be known with greater accuracy than that provided by an absolute, previous calibration of the mass spectrometer.
In order for mass spectrometry to be a useful tool for screening for polymorphisms in nucleic acids, several basic requirements need to be met. First, any nucleic acids to be analyzed must be purified to the extent that minimizes salt ions and other molecular contaminants that reduce the intensity and quality of the mass spectrometric signal to a point where either the signal is undetectable or unreliable, or the mass accuracy and/or resolution is below the value necessary to detect the type of polymorphism expected. Second, the size of the nucleic acids to be analyzed must be within the range of the mass spectrometry-where there is the necessary mass resolution and accuracy. Mass accuracy and resolution do significantly degrade as the mass of the analyte increases; currently this is especially significant above approximately 30,000 Da for oligonucleotides (xcx9c100 bases), impacting the detection of single nucleotide polymorphisms (SNPs) above said mass value. Third, because all molecules within a sample are visualized during mass spectrometric analysis (i.e. it is not possible to selectively label and visualize certain molecules and not others as one can with gel electrophoresis methods) it is necessary to partition nucleic acid samples prior to analysis in order to remove unwanted nucleic acid products from the spectrum. Fourth, the mass spectrometric methods for generalized nucleic acid screening must be efficient and cost effective in order to screen a large number of nucleic acid bases in as few steps as possible.
The methods for detecting nucleic acid polymorphisms known in the art do not satisfy these four requirements. For example, prior art methods for mass spectrometric analysis of DNA fragments have focussed on double-stranded DNA fragments which result in complicated mass spectra, making it difficult to resolve mass differences between two complementary strands. See, e.g., Tang et al., Rapid Comm""n. in Mass Spectrometry, 8:183-186 (1994). Moreover, the prior art has not provided optimal methods for isolating single-stranded amplified target nucleic acids to improve mass accuracy in higher mass ranges.
Thus, there is a need for cost and time effective methods of detecting genetic polymorphisms using mass spectrometry, preferably MALDI or ESI, and with mass accuracy of a few parts in 10,000 or better.
The present invention comprises several aspects, including (1) procedures for reducing the length of target nucleic acids to remove one or more flanking polynucleotide regions that flank the regions of interest, which contain or are suspected to contain a polymorphism; (2) procedures for isolating either single-stranded or double-stranded target nucleic acids for mass spectrometric analysis; and (3) procedures combining these two aspects; and (3) kits for the methods described herein.
The methods for reducing the length of target nucleic acids, preferably in amplified form, eliminate unnecessary sequences and by reducing the length also reduce the mass of the resulting single-stranded or double-stranded target nucleic acids, which increases mass resolution and accuracy. The target nucleic acids may be reduced in length by any of the methods known that will cleave within one or more flanking regions preferably without cleaving within the region of interest. Exemplary methods of reducing length include: cleaving at endogenous restriction endonuclease cleavable sites present in one or more flanking regions but absent in the region of interest; cleaving at restriction endonuclease cleavable sites at or adjacent to restriction endonuclease recognition sites incorporated into one or more flanking regions by use of one or more cleavable primers comprising said restriction endonuclease recognition sites; cleaving at a combination of restriction endonuclease cleavable sites wherein the sites are endogenous and/or introduced using mismatch or overhanging primers; and selective digestion of one or more flanking regions using exonuclease. The restriction endonucleases can include type II and type IIS restriction endonucleases. The restriction endonuclease recognition sites can be either within a primer region, or outside the primer region, so long as the restriction endonuclease cleavable sites are within one or more flanking regions and preferably not within a region of interest. For type II restriction endonucleases, the restriction endonuclease recognition site is the same as the restriction endonuclease cleavable site. For Type IIS restriction endonucleases, the cleavable site is at a defined distance away from one side of the recognition site. Accordingly, cleavable primers may contain one or more restriction recognition sites of one or more different restriction endonucleases, one or more cleavable sites of one or more different restriction endonucleases, one or more exonuclease blocking moieties, or a combination thereof.
Another embodiment of the invention involves reducing the length of an amplified target nucleic acid and isolating a single-stranded amplified target nucleic acid at the same time by using a cleavable primer having an exonuclease blocking moiety. After amplification of the target nucleic acid, the amplified target nucleic acid comprises an exonuclease blocking moiety. The amplified target nucleic acid is then treated with a 5xe2x80x2 to 3xe2x80x2 exonuclease, which degrades the strand containing the exonuclease blocking moiety in a 5xe2x80x2 to 3xe2x80x2 direction only up to the blocking moiety. The 5xe2x80x2 to 3xe2x80x2 exonuclease can optionally completely degrade the other strand of the amplified target nucleic acid, in cases wherein the other strand does not have an exonuclease blocking moiety. The treatment with the 5xe2x80x2 to 3xe2x80x2 exonuclease leaves a reduced-length, single-stranded amplified target nucleic acid for mass spectrometric analysis.
Yet another embodiment of the invention involves use of cleavable primers to reduce the length of an amplified target nucleic acid. An amplified target nucleic acid can be reduced in length by cleaving off at least a portion of one or more flanking regions comprising a cleavable site, wherein the cleavable site is introduced via a cleavable primer, wherein the cleavable site is located outside of the region of interest. Cleavable primers of the invention include those comprising an exonuclease blocking moiety, a Type IIS restriction endonuclease recognition site, and a Type II restriction endonuclease recognition site, but does not include a Type II restriction endonuclease recognition site where one of the complementary strands cannot be cleaved by a Type II restriction endonuclease.
The present invention also provides methods for isolating single-stranded or double-stranded amplified target nucleic acids. At least one strand of an amplified target nucleic acid can be bound to a solid support to permit rigorous washing to remove salt adducts, unwanted oligonucleotides and enzymes. Either a double-stranded amplified target nucleic acid or a single-stranded amplified target nucleic acid, whether the bound strand or the unbound strand, can be isolated for mass spectrometric analysis. Cleavable linkers or cleavable primers can be used to release the bound strands from the solid support. The isolation provides significantly improved mass resolution and accuracy in large mass ranges. Also, the isolation of either single-stranded or double-stranded amplified target nucleic acids occurs prior to the application of the nucleic acids to the matrix solution, which results in well-defined mass spectral peaks and enhanced mass accuracy. The matrix solution can be any of the known matrix solutions used for mass spectrometric analysis, including 3-hydroxypicolinic acid, nicotinic acid, picolinic acid, 2,5-dihydroxybenzoic acid, nitrophenol.
The present invention provides methods of detecting polymorphisms in one or more target nucleic acids comprising: amplifying at least one target nucleic acid, wherein said amplified target nucleic acid comprises a region of interest and optionally one or more flanking regions; reducing the length of at least one of said amplified target nucleic acids comprising cleaving off a portion of one or more flanking regions, and determining the masses of each of said reduced-length amplified target nucleic acids using a mass spectrometer. This method can be used to detect polymorphisms in a single target nucleic acid by detecting variability in mass as compared to a wild type target nucleic acid or other xe2x80x9callelesxe2x80x9d of said target nucleic acid.
In another embodiment, methods are provided for detecting polymorphisms in at least one target nucleic acid comprising: amplifying at least one target nucleic acid, wherein said amplified target nucleic acid comprises a region of interest and optionally one or more flanking regions, isolating either a positive or negative strand of said amplified target nucleic acid to form a single-stranded amplified target nucleic acid and determining the masses of each single-stranded amplified target nucleic acid using a mass spectrometer.
In yet another embodiment, methods are provided for detecting polymorphisms in at least one target nucleic acid comprising: amplifying at least one target nucleic acid, wherein said amplified target nucleic acid comprises a region of interest and optionally one or more flanking regions, reducing the length of at least one of said amplified target nucleic acids comprising cleaving off a portion of one or more flanking regions, isolating either a positive or negative strand of said amplified target nucleic acid to form an amplified target nucleic acid, and determining the mass of each single-stranded amplified target nucleic acid using a mass spectrometer.
In the amplification methods, it is preferred that at least one of the primers be designed to be close to the polynucleotide region of interest, generally within 40 nucleotides.
The methods can also be used to detect polymorphisms in a set of different target nucleic acids, comprising amplifying each of said target nucleic acids, reducing length and/or isolating a single-strand of each of said amplified target nucleic acids, and determining the mass of each of said single-strands of said amplified target nucleic acids using mass spectrometry. Thus, these methods can be used to detect polymorphisms in a plurality of different target nucleic acids simultaneously.
The target nucleic acids can comprise any polynucleotide sequence that contains or is suspected of containing a polymorphism, including but not limited to short tandem repeats (STRs), simple sequence length polymorphisms (SSLP), single nucleotide polymorphisms (SNPs), and the multitude of disease markers, for example, markers for sickle cell anemia, fragile X disorder, cystic fibrosis, Tay Sachs disease, Gaucher disease, thalassemias, and cancer-related genes. The preferably single-stranded amplified target nucleic acids can be any size that can be adequately resolved by mass spectrometric analysis. Preferably, in cases where a SNP is to be detected, the final product single-stranded amplified target nucleic acids are less than 100 bases in length. Most preferably, the final product, single-stranded amplified target nucleic acids are from 10 to 90 bases in length. The nature of the mutation to be detected is a factor in the size limitations for optimum mass resolution. For example, as described above for SNPs, the maximum size limit is approximately 100 nucleotides in length. For microsatellite repeats and other two nucleotide repeats, the maximum size limit is approximately 200 nucleotides in length. For four-nucleotide repeats, the maximum size limit is approximately 300 nucleotides. One of ordinary skill in the art will appreciate that as mass spectrometric techniques for analysis of nucleic acids improve, the sizes of single-stranded amplified target nucleic acids useful in this invention can be increased. Using the methods described herein, one can uniquely identify a genomic sample by amplifying said target nucleic acids, isolating single-stranded amplified target nucleic acids, and determining the masses of said single-stranded amplified target nucleic acids using mass spectrometry. The resulting mass determination or mass spectrum will provide information which can be used to indicate a disease state, or propensity to disease, or to uniquely identify the source of the sample, or to map locations in a genome.
In yet another embodiment, methods are provided for detecting polymorphisms in at least one amplified target nucleic acid further comprising removing at least one flanking polynucleotide region, if present, from at least one of said amplified target nucleic acids before said isolating step.
In a further embodiment, methods for detecting polymorphisms are described wherein said isolating step comprises binding said amplified target nucleic acid to a solid support and said removing step comprises using one or more restriction endonucleases to cleave off one or more flanking polynucleotide regions.
The mass of a preferably single-stranded amplified target nucleic acid can be compared with the known or predicted mass of the corresponding wild type single-stranded amplified target nucleic acid that is the wild type version of the target nucleic acid that is being screened for polymorphism. Alternatively, the masses of more than one amplified target nucleic acid can be compared with the known or predicted masses of the corresponding wild type amplified target nucleic acids. The amplified target nucleic acid or set thereof, can optionally have one or more nucleotides replaced with mass-modified nucleotides, including mass-modified nucleotide analogs. Another optional aspect of the invention is the inclusion of internal calibrants or internal self-calibrants in the amplified target nucleic acid or set thereof to be analyzed by mass spectrometry to provide improved mass accuracy.
These above-described methods can also be combined with isolation methods designed to isolate a single-stranded amplified target nucleic acid or a set of single-stranded amplified target nucleic acids, for example, only those single-stranded target nucleic acids derived from the + or sense strand of the genome. The isolation methods include direct capture of one of the two strands of a double-stranded amplified target nucleic acid or set of such molecules, to a solid support or indirect capture of a single-stranded or double-stranded amplified target nucleic acid or set thereof to a solid support via a capture probe capable of binding to a solid support via covalent or noncovalent binding.
A further aspect of the invention includes the methods of detecting polymorphisms wherein said determining step optionally further comprises utilizing internal self-calibrants to provide improved mass accuracy. The isolation methods separately or together can also be combined with the use of internal self-calibrants.
The above methods, separately or in combination, can also be combined with the use of mass-modified nucleotides and mass-modified nucleotide analogs incorporated in the single-stranded or double-stranded amplified target nucleic acid or set of single-stranded or double-stranded amplified target nucleic acids to improve mass resolution between mass peaks. The methods of detecting polymorphisms may also include at least one single-stranded amplified target nucleic acid optionally having one or more nucleotides replaced with mass-modified nucleotides.
In another embodiment, kits for preparing amplified target nucleic acids for mass spectrometric analysis are also provided. The kits of the invention comprise a first primer capable of binding a first strand of one of said target nucleic acids at a region 5xe2x80x2 to a region of interest of said target nucleic acid; a second primer capable of binding a second strand complementary to said first strand at a region 5xe2x80x2 to said region of interest of said target nucleic acid; a DNA polymerase capable of extending said primers to form primer extension products of said first and second primers; wherein said first and second primers and said DNA polymerase are provided in a concentration and buffer suitable for increasing the number of target nucleic acids to form amplified target nucleic acids, and a restriction endonuclease capable of reducing length of amplified target nucleic acids. Another embodiment is a kit comprising: a first primer capable of binding a first strand of one of said target nucleic acids at a region 5xe2x80x2 to a region of interest of said target nucleic acid; a second primer capable of binding a second strand complementary to said first strand at a region 5xe2x80x2 to said region of interest of said target nucleic acid; a DNA polymerase capable of extending said primers to form primer extension products of said first and second primers; wherein at least one of said first or second primers is a cleavable primer.