This invention relates generally to methods for screening nucleic acids for mutations by analyzing fragmented nucleic acids using mass spectrometry.
Approximately 4,000 human disorders are attributed to genetic causes. Hundreds of genes responsible for various disorders have been mapped, and sequence information is being accumulated rapidly. A principal goal of the Human Genome Project is to find all genes associated with each disorder. The definitive diagnostic test for any specific genetic disease (or predisposition to disease) will be the identification of mutations in affected cells that result in alterations of gene function. Furthermore, response to specific medications may depend on the presence of mutations. Developing DNA (or RNA) screening as a practical tool for medical diagnostics requires a method that is inexpensive, accurate, expeditious, and robust.
Genetic mutations can manifest themselves in several forms, such as point mutations where a single base is changed to one of the three other bases, deletions where one or more bases are removed from a nucleic acid sequence and the bases flanking the deleted sequence are directly linked to each other, and insertions where new bases are inserted at a particular point in a nucleic acid sequence adding additional length to the overall sequence. Large insertions and deletions, often the result of chromosomal recombination and rearrangement events, can lead to partial or complete loss of a gene. Of these forms of mutation, in general the most difficult type of mutation to screen for and detect is the point mutation because it represents the smallest degree of molecular change. The term mutation encompasses all the above-listed types of differences from wild type nucleic acid sequence. Wild type is a standard or reference nucleotide sequence to which variations are compared. As defined, any variation from wild type is considered a mutation including naturally occurring sequence polymorphisms.
Although a number of genetic defects can be linked to a specific single point mutation within a gene, e.g. sickle cell anemia, many are caused by a wide spectrum of different mutations throughout the gene. A typical gene that might be screened using the methods described here could be anywhere from 1,000 to 100,000 bases in length, though smaller and larger genes do exist. Of that amount of DNA, only a fraction of the base pairs actually encode the protein. These discontinuous protein coding regions are called exons and the remainder of the gene is referred to as introns. Of these two types of regions, exons often contain the most important sequences to be screened. Several complex procedures have been developed for scanning genes in order to detect mutations, which are applicable to both exons and introns.
Gel Electrophoresis
Several of the procedures described below use some form of gel electrophoresis. Therefore it is worthwhile to briefly consider this separation technology before proceeding to the specific methods. In terms of current use, most of the methods to scan or screen genes employ slab or capillary gel electrophoresis for the separation and detection step in the assays. Gel electrophoresis of nucleic acids primarily provides relative size information based on mobility through the gel matrix. If calibration standards are employed, gel electrophoresis can be used to measure absolute and relative molecular weights of large biomolecules with some moderate degree of accuracy; even then typically the accuracy is only 5% to 10%. Also the molecular weight resolution is limited. In cases where two DNA fragments with identical number of base pairs can be separated, using high concentration polyacrylamide gels, it is still not possible to identify which band on a gel corresponds to which DNA fragment without performing secondary labeling experiments. Gel electrophoresis techniques can only determine size and cannot provide any information about changes in base composition or sequence without performing more complex sequencing reactions. Gel-based techniques, for the most part, are dependent on labeling methods to visualize and discriminate between different nucleic acid fragments.
DNA Sequencing
The principal approach currently used to screen for genetic mutations is DNA sequencing. Sequencing reactions can be performed to screen the full genetic target base by base. This process, which can pinpoint the exact location and nature of mutation, requires labeling DNA, use of polyacrylamide gels, and a multiplicity of reactions to assess all bases over the length of a gene, all of which are slow and labor intensive procedures. [J. Bergh et al. xe2x80x9cComplete Sequencing of the p53 Gene Provides Prognostic Information in Breast Cancer Patients, Particularly in Relation to Adjuvant Systemic Therapy and Radiotherapy,xe2x80x9d Nature Medicine 1, 1029 (1995)].
For DNA sequencing, nucleic acids comprising different exons or small clusters of exons are individually amplified, often using polymerase chain reaction (PCR). The amplifications are normally performed separately although some multiplexing of reactions is possible. The amplified nucleic acids typically range from one hundred to several thousand bases in length. Following amplification, the PCR products can serve as templates for standard dideoxy-based Sanger sequencing reactions. The four different sequencing reactions are run (or for fluorescence detection, one reaction with four different dye terminators) and then analyzed by polyacrylamide gel electrophoresis. Each sequencing run yields about 300 to 600 bases of sequence which typically must be read with at least a two to three-fold redundancy in order to assure accuracy. Using slab gel, the analysis process typically takes several hours.
SSCP
The single strand conformational polymorphism assay takes advantage of structural variation within DNA that results from mutation. The method involves folding the single-stranded form of a given nucleic acid sequence into a thermodynamically directed secondary and tertiary structure. In most cases, mutated sequences form different structures than the wild type sequence, thus permitting separation of mutated and wild type sequences by gel electrophoresis. Like sequencing, this assay is complicated by the need to label molecules and run polyacrylamide gels. In a typical case, mutations can be located within a general range of 50 to 200 base pairs, but the exact nature of the mutation cannot be identified. [M. Orita et al., xe2x80x9cDetection of Polymorphisms of Human DNA by Gel Electrophoresis as Single-Stranded Conformation Polymorphisms,xe2x80x9d Proc. Natl. Acad. Sci. USA 86, 2766 (1989)].
DGGE
Like SSCP, denaturing gradient gel electrophoresis assays also differentiate based on structural variation, but require the use of gradient gels, which are difficult to prepare. The different thermodynamic stability of structures formed by the mutant sequence, as opposed to wild type, lead to differences in the temperature and/or pH at which the molecule will denature. DGGE mutation identification and localization properties are similar to those for SSCP though sensitivity is higher for DGGE because not all mutations cause the structural changes that the SSCP method depends upon for detection. [E. S. Abrams, S. E. Murdaugh and L. S. Lerman, xe2x80x9cComprehensive Detection of Single Base Changes in Human Genomic DNA Using Denaturing Gradient Gel Electrophoresis and a GC Clamp,xe2x80x9d Genomics 7, 463 (1990)].
EMC
Enzyme mismatch cleavage utilizes one or more enzymes that are capable of recognizing interruptions in base pairing within a double-stranded nucleic acid molecule, e.g. base-base mismatches, bulges, or internal loops. A given length of DNA or RNA is prepared in heterozygous form, with one strand composed of wild type nucleic acid and the other strand containing a potential mutation. At the specific site where the mutation forms a mismatch with the wild type sequence, a structural perturbation occurs. An enzyme such as T4 Tendonuclease VII, RuvC, RNase A, or MutY, can recognize such a structural perturbation and can site-specifically cut the double-stranded nucleic acid, creating smaller molecules whose sizes indicate the presence and location of the mutation. As with the previously discussed methods, this approach as currently used, also requires labeling and gel electrophoresis. With this method, the site of mutation can be localized to within a few base pairs but the exact nature of the mutation cannot be determined. [R. Youil, B. W. Kemper and R. G. H. Cotton, xe2x80x9cScreening for Mutations by Enzyme Mismatch Cleavage with T4 Endonuclease VII,xe2x80x9d Proc. Natl. Acad. Sci. USA 92, 87 (1995)].
CCM
A variation of EMC is to replace the enzymatic cleavage step with chemical cleavage. Chemical cleavage mismatch analysis involves the use of reagents such as osmium tetroxide to react with mismatched thymine residues or hydroxylamine to react with mismatched cytosine residues. Cleavage of the modified mismatched residues occurs when the modified bases are subsequently treated with piperidine or another oxidizing agent. The effectiveness of the method is similar to EMC. [J. A. Saleeba and R. G. H. Cotton, xe2x80x9cChemical Cleavage of Mismatch to Detect Mutations,xe2x80x9d Methods in Enzymology 217, 286 (1993)].
Hybridization Arrays
Several approaches to screening for mutations involve the probing of a target nucleic acid by an array of oligonucleotides that can differentiate between normal wild type nucleic acids and mutant nucleic acids. These arrays involve the performance of hundreds or thousands of hybridization reactions in parallel with different site-directed oligonucleotides and requires sophisticated and costly probe arrays. Hybridization arrays can identify the location and type of mutation in many, but not all cases. For example, semihomologous sequential insertions or targets with repeating sequences and/or repeating sequential motifs cannot be analyzed by hybridization. [A. C. Pease et al., xe2x80x9cLight-Generated Oligonucleotide Arrays for Rapid DNA Sequence Analysis,xe2x80x9d Proc. Natl. Acad. Sci. USA 91, 5022 (1994)].
Simple Screens
For mutations localized within a given gene, such as the cystic fibrosis xcex94F508 deletion, it is also possible to perform a single PCR or ligase chain reaction (LCR) assay or simple hybridization assays tailored to these specific sites. PCR and LCR results are presently determined by the use of labeled molecules, where radioactive emissions, fluorescence, chemiluminescence or color changes are detected directly. These simple screens amount to a yes/no answer and do not directly identify the nature of the mutation, only whether or not a reaction took place. [P. Fang et al., xe2x80x9cSimultaneous Analysis of Mutant and Normal Alleles for Multiple Cystic Fibrosis Mutations by the Ligase Chain Reaction,xe2x80x9d Human Mutation 6, 144 (1995)].
All of the methods in use today capable of screening broadly for genetic mutations suffer from technical complication and are labor and time intensive. There is a need for new methods that can provide cost effective and expeditious means for screening genetic material in an effort to reduce medical expenses. The inventions described here address these issues by developing novel, tailor-made processes that focus on the use of mass spectrometry as a genetic analysis tool. Mass spectrometry requires minute samples, provides extremely detailed information about the molecules being analyzed including high mass accuracy, and is easily automated.
The late 1980""s saw the rise of two new mass spectrometric techniques for successfully measuring the masses of intact very large biomolecules, namely, matrix-assisted laser desorption/ionization (MALDI) time-of-flight mass spectrometry (TOF MS) [K. Tanaka et al., xe2x80x9cProtein and Polymer Analyses up to m/z 100,000 by Laser Ionization Time-of-flight Mass Spectrometry,xe2x80x9d Rapid Commun. Mass Spectrom. 2, 151-153 (1988); B. Spengler et al., xe2x80x9cLaser Mass Analysis in Biology,xe2x80x9d Ber. Bunsenges. Phys. Chem. 93, 396-402 (1989)]. and electrospray ionization (ESI) combined with a variety of mass analyzers [J. B. Fenn et al., Science 246, 64-71 (1989)]. Both of these two methods are suitable for genetic screening tests. The MALDI mass spectrometric technique can also be used with methods other than time-of-flight, for example, magnetic sector, Fourier-Transform, ion cyclotron resonance, quadropole, and quadropole trap. One of the advances in MALDI analysis of polynucleotides was the discovery of 3-hydroxypicolinic acid as an ideal matrix for mixed-base oligonucleotides. Wu, et al., Rapid Comm""ns in Mass Spectrometry, 7:142-146 (1993).
MALDI-TOF MS involves laser pulses focused on a small sample plate comprising analyte molecules (nucleic acids) embedded in either a solid or liquid matrix comprising a small, highly absorbing compound. The laser pulses transfer energy to the matrix causing a microscopic ablation and concomitant ionization of the analyte molecules, producing a gaseous plume of intact, charged nucleic acids in single-stranded form. If double-stranded nucleic acids are analyzed, the MALDI-TOF MS typically results in mostly denatured single-strand detection. The ions generated by the laser pulses are accelerated to a fixed kinetic energy by a strong electric field and then pass through an electric field-free region in vacuum in which the ions travel with a velocity corresponding to their respective mass-to-charge ratios (m/z). The smaller m/z ions will travel through the vacuum region faster than the larger m/z ions thereby causing a separation. At the end of the electric field-free region, the ions collide with a detector that generates a signal as each set of ions of a particular mass-to-charge ratio strikes the detector. Usually for a given assay, 10 to 100 mass spectra resulting from individual laser pulses are summed together to make a single composite mass spectrum with an improved signal-to-noise ratio.
The mass of an ion (such as a charged nucleic acid) is measured by using its velocity to determine the mass-to-charge ratio by time-of-flight analysis. In other words, the mass of the molecule directly correlates with the time it takes to travel from the sample plate to the detector. The entire process takes only microseconds. In an automated apparatus, tens to hundreds of samples can be analyzed per minute. In addition to speed, MALDI-TOF MS has one of the largest mass ranges for mass spectrometric devices. The current mass range for MALDI-TOF MS is from 1 to 1,000,000 Daltons (Da) (measured recently for a protein). [R. W. Nelson et al., xe2x80x9cDetection of Human IgM at m/zxcx9c1 MDa,xe2x80x9d Rapid Commun. Mass Spectrom. 9, 625 (1995)].
The performance of a mass spectrometer is measured by its sensitivity, mass resolution and mass accuracy. Sensitivity is measured by the amount of material needed; it is generally desirable and possible with mass spectrometry to work with sample amounts in the femtomole and low picomole range. Mass resolution, m/xcex94m, is the measure of an instrument""s ability to produce separate signals from ions of similar mass. Mass resolution is defined as the mass, m, of a ion signal divided by the full width of the signal, xcex94m, usually measured between points of half-maximum intensity. Mass accuracy is the measure of error in designating a mass to an ion signal. The mass accuracy is defined as the ratio of the mass assignment error divided by the mass of the ion and can be represented as a percentage.
To be able to detect any point mutation directly by MALDI-TOF mass spectrometry, one would need to resolve and accurately measure the masses of nucleic acids in which a single base change has occurred (in comparison to the wild type nucleic acid). A single base change can be a mass difference of as little as 9 Da. This value represents the difference between the two bases with the closest mass values, A and T (A=2xe2x80x2-deoxyadenosine-5xe2x80x2-phosphate=313.19 Da; T=2xe2x80x2-deoxythymidine-5xe2x80x2-phosphate=304.20 Da; G=2xe2x80x2-deoxyguanosine-5xe2x80x2-phosphate=329.21 Da; and C=2xe2x80x2-deoxycytidine-5xe2x80x2-phosphate=289.19 Da). If during the mutation process, a single A changes to T or a single T to A, the mutant nucleic acid containing the base transversion will either decrease or increase by 9 in total mass as compared to the wild type nucleic acid. For mass spectrometry to directly detect these transversions, it must therefore be able to detect a minimum mass change, xcex94m, of approximately 9 Da.
For example, in order to fully resolve (which may not be necessary) a point-mutated (A to T or T to A) heterozygote 50-base single-stranded DNA fragment having a mass, m, of xcx9c15,000 Da from its corresponding wild type nucleic acid, the required mass resolution is m/xcex94m=15,000/9≈1,700. However, the mass accuracy needs to be significantly better than 9 Da to increase quality assurance and to prevent ambiguities where the measured mass value is near the half-way point between the two theoretical masses. For an analyte of 15,000 Da, in practice the mass accuracy needs to be xcex94mxcx9cxc2x13 Da=6 Da. In this case, the absolute mass accuracy required is (6/15,000)*100=0.04%. Often a distinguishing level of mass accuracy relative to another known peak in the spectrum is sufficient to resolve ambiguities. For example, if there is a known mass peak 1000 Da from the mass peak in question, the relative position of the unknown to the known peak may be known with greater accuracy than that provided by an absolute, previous calibration of the mass spectrometer.
In order for mass spectrometry to be a useful tool for screening for mutations in nucleic acids, several basic requirements need to be met. First, any nucleic acids to be analyzed must be purified to the extent that minimizes salt ions and other molecular contaminants that reduce the intensity and quality of the mass spectrometric signal to a point where either the signal is undetectable or unreliable, or the mass accuracy and/or resolution is below the value necessary to detect single base change mutations. Second, the size of the nucleic acids to be analyzed must be within the range of the mass spectrometryxe2x80x94where there is the necessary mass resolution and accuracy. Mass accuracy and resolution do significantly degrade as the mass of the analyte increases; currently this is especially significant above approximately 30,000 Da for oligonucleotides (xcx9c100 bases). Third, because all molecules within a sample are visualized during mass spectrometric analysis (i.e. it is not possible to selectively label and visualize certain molecules and not others as one can with gel electrophoresis methods) it is necessary to partition nucleic acid samples prior to analysis in order to remove unwanted nucleic acid products from the spectrum. Fourth, the mass spectrometric methods for generalized nucleic acid screening must be efficient and cost effective in order to screen a large number of nucleic acid bases in as few steps as possible.
The methods for detecting nucleic acid mutations known in the art do not satisfy these four requirements. For example, prior art methods for mass spectrometric analysis of DNA fragments have focussed on double-stranded DNA fragments which result in complicated mass spectra, making it difficult to resolve mass differences between two complementary strands. See, e.g., Tang et al., Rapid Comm""n. in Mass Spectrometry, 8:183-186 (1994).
Thus, there is a need for cost and time effective methods of detecting genetic mutations using mass spectrometry, preferably MALDI or ES, without having to sequence the genetic material and with mass accuracy of a few parts in 10,000 or better.
The present invention provides methods of and kits for detecting mutations in a target nucleic acid comprising nonrandomly fragmenting said target nucleic acid to form a set of nonrandom length fragments (NLFs), determining masses of members of said set of NLFs using mass spectrometry, wherein said determining does not involve sequencing of said target nucleic acid.
In a preferred embodiment, the method of detecting mutations comprises obtaining a set of nonrandom length fragments in single-stranded form. The masses of the members of the set of NLFs can be compared with the known or predicted masses of a set of NLFs derived from a wild type target nucleic acid that is the wild type version of the target nucleic acid that is being screened for mutations. The members of the set of single-stranded NLFs can optionally have one or more nucleotides replaced with mass-modified nucleotides, including mass-modified nucleotide analogs. Another optional aspect of the invention is the inclusion of internal calibrants or internal self-calibrants in the set of nonrandom length fragments to be analyzed by mass spectrometry to provide improved mass accuracy.
The present invention includes a number of nonrandom fragmentation techniques for nonrandomly fragmenting a target nucleic acid.
In one embodiment, the nonrandom fragmentation technique comprises hybridizing a single-stranded target nucleic acid to one or more sets of fragmenting probes to form hybrid target nucleic acid/fragmenting probe complexes comprising at least one double-stranded region and at least one single-stranded region, nonrandomly fragmenting said target nucleic acid by cleaving said hybrid target nucleic acid/fragmenting probe complexes at every single-stranded region with at least one single-strand-specific cleaving reagent to form a set of NLFs. The set of fragmenting probes can leave single-stranded regions between double-stranded regions formed by hybridization of said set of fragmenting probes to said target nucleic acid. A single-stranded region comprises a portion of a polynucleotide sequence as small as a single phosphodiester bridge, i.e. the phosphodiester bond across from a nick, to 450 nucleotides in length.
The fragmenting probes are oligonucleotides that are complementary to a nucleotide sequence of the target nucleic acid. A set of fragmenting probes can be created such that the nucleotide sequences of the members of the set of fragmenting probes represents the entire complement to the nucleotide sequence of the target nucleic acid. For example, a set of fragmenting probes can provide complete complementary sequence to the target nucleic acid. Alternatively, a set of fragmenting probes, when hybridized to the target nucleic acid, can leave single-stranded regions. Also, one or more sets of fragmenting probes can be used such that the members of one set of fragmenting probes contain nucleotide sequences that overlap with nucleotide sequences of members of a second set of fragmenting probes. In yet another aspect, there are provided two sets of fragmenting probes, where members of the second set of fragmenting probes comprise at least one single-stranded nucleotide sequence complementary to regions of said target nucleic acid that are not complementary to any nucleotide sequences in any members of said first set of fragmenting probes.
Once the set(s) of fragmenting probes are hybridized to the target nucleic acid, the single-stranded regions are cleaved using single-strand-specific cleaving reagents, including enzymatic reagents as well as chemical reagents. Single-strand specific chemical cleaving reagents include hydroxylamine, hydrogen peroxide, osmium tetroxide, and potassium permanganate.
Yet another nonrandom fragmentation technique comprises providing a single-stranded target nucleic acid, hybridizing the single-stranded target nucleic acid to one or more restriction site probes to form hybridized target nucleic acids comprising double-stranded regions where said restriction site probes have hybridized to said single-stranded target nucleic acid and at least one single-stranded region, nonrandomly fragmenting the hybridized target nucleic acids using one or more restriction endonucleases that cleave at restriction sites within the double-stranded regions. Another variation on this technique involves use of universal restriction probes comprising two regions, the first region being single-stranded and complementary to a specific site within the target nucleic acid, and the second region being double-stranded and containing the restriction recognition site for a particular class IIS restriction endonuclease. Class IIS restriction endonucleases cleave double-stranded DNA at a specific distance from their recognition site sequence.
Another technique for nonrandom fragmentation comprises fragmenting the target nucleic acid with one or more restriction endonucleases to form a set of NLFs. This and the other forms of nonrandom fragmentation can be combined with direct and indirect capture to a solid support to isolate single-stranded NLFs for mass spectrometric analysis.
Another nonrandom fragmentation technique comprises providing conditions permitting folding of said single-stranded target nucleic acid to form a three-dimensional structure having intramolecular secondary and tertiary interactions, and nonrandomly fragmenting said folded target nucleic acid with at least one structure-specific endonuclease to form a set of single-stranded NLFs. A set of nonrandom length fragments can comprise a nested set of NLFs, wherein each member of the set has a 5xe2x80x2 end of the target nucleic acid. The structure-specific endonucleases useful for nonrandom fragmentation comprise any nucleases that cleave at structural transitions within nucleic acids, including: Holliday junctions, single-strand to double-strand transitions, or at the ends of hairpin structures.
Another nonrandom fragmentation method comprises mutation-specific cleavage by hybridizing a target nucleic acid to a set of one or more wild type probes and specifically cleaving at any regions of nucleotide mismatch or base mismatch that form between the target nucleic acid and a wild type probe. The mutation-specific cleavage can be accomplished using a mutation-specific cleaving reagent comprising structure-specific endonuclease or chemical reagents.
The nonrandom fragmentation methods described herein can be combined to form different sets or subsets of nonrandom length fragments. For example, the base mismatch nonrandom fragmentation method using wild type probes can be used in concert with a set of nonrandom length fragments that have already been created using any one of the other nonrandom fragmentation methods. These nonrandom fragmentation methods can also be combined with isolation methods designed to isolate specific sets of single-stranded nonrandom length fragments, for example, only those NLFs derived from the + strand of the target nucleic acid. The isolation methods include direct capture of the set of NLFs to a solid support or indirect capture of a set of NLFs to a solid support via a capture probe capable of binding to a solid support via covalent or noncovalent binding. The fragmenting, wild type, restriction site, and universal restriction probes described herein can be also be used as capture probes for isolating a particular set of NLFs.
The isolation methods also comprise the use of a solution of volatile salts to wash away undesired contaminants from the set of NLFs intended for mass determination in the mass spectrometer. The volatile salts are useful for removing background noise and can be easily removed by evaporation of the volatile salts prior to mass spectrometric analysis. Volatile salt solutions can be used in a variety of different methods to prepare organic molecules such as nucleic acids and polypeptides for mass spectrometric analysis. Thus, a method is described herein of decreasing background noise, wherein the method comprises obtaining a sample to be analyzed by a mass spectrometer, washing the sample with a solution of volatile salts, and evaporating the solution of volatile salts from the sample.
The fragmentation and isolation methods separately or together can also be combined with the use of internal self-calibrants to improve the mass accuracy of the mass spectrometric analysis.
The above methods, separately or in combination, can also be combined with the use of mass-modified nucleotides and mass-modified nucleotide analogs incorporated in the target nucleic acid or a set of NLFs to improve mass resolution between mass peaks.
Kits for detecting mutations in one or more target nucleic acids in a sample are also provided. In preferred embodiments, such kits comprise one or more single-stranded target nucleic acids, one or more sets of oligonucleotide probes, wherein each of said probes is complementary to a portion of said single-stranded target nucleic acids, and various cleaving reagents, including single-strand specific cleaving reagents, restriction endonucleases (both Class II and Class IIS), and mutation-specific cleaving reagents. The oligonucleotide probes include fragmenting probes, restriction site probes, and wild type probes. Such kits can also contain a matrix, preferably 3-hydroxypicolinic acid. The kits may also contain volatile salt buffers, and buffers providing conditions suitable for the enzymatic or chemical reactions described above for nonrandomly fragmenting target nucleic acids and isolating nonrandom length fragments in preparation for mass spectrometric analysis. Additionally, the kits may contain solid supports for purposes of isolating nonrandom length fragments.