The genetic information of all living organisms (e.g. animals, plants and microorganisms) is encoded in deoxyribonucleic acid (DNA). In humans, the complete genome is comprised of about 100,000 genes located on 24 chromosomes (The Human Genome, T. Strachan, BIOS Scientific Publishers, 1992). Each gene codes for a specific protein which after its expression via transcription and translation, fulfills a specific biochemical function within a living cell. Changes in a DNA sequence are known as mutations and can result in proteins with altered or in some cases even lost biochemical activities; this in turn can cause genetic disease. Mutations include nucleotide deletions, insertions or alterations (i.e. point mutations). Point mutations can be either xe2x80x9cmissensexe2x80x9d, resulting in a change in the amino acid sequence of a protein or xe2x80x9cnonsensexe2x80x9d coding for a stop codon and thereby leading to a truncated protein.
More than 3000 genetic diseases are currently known (Human Genome Mutations, D. N. Cooper and M. Krawczak, BIOS Publishers, 1993), including hemophilias, thalassemias, Duchenne Muscular Dystrophy (DMD), Huntington""s Disease (HD), Alzheimer""s Disease and Cystic Fibrosis (CF). In addition to mutated genes, which result in genetic disease, certain birth defects are the result of chromosomal abnormalities such as Trisomy 21 (Down""s Syndrome), Trisomy 13 (Patau Syndrome), Trisomy 18 (Edward""s Syndrome), Monosomy X (Turner""s Syndrome) and other sex chromosome aneuploidies such as Klienfelter""s Syndrome (XXY). Further, there is growing evidence that certain DNA sequences may predispose an individual to any of a number of diseases such as diabetes, arteriosclerosis, obesity, various autoimmune diseases and cancer (e.g. colorectal, breast, ovarian, lung).
Viruses, bacteria, fungi and other infectious organisms contain distinct nucleic acid sequences, which are different from the sequences contained in the host cell. Therefore, infectious organisms can also be detected and identified based on their specific DNA sequences.
Since the sequence of about 16 nucleotides is specific on statistical grounds even for the size of the human genome, relatively short nucleic acid sequences can be used to detect normal and defective genes in higher organisms and to detect infectious microorganisms (e.g. bacteria, fungi, protists and yeast) and viruses. DNA sequences can even serve as a fingerprint for detection of different individuals within the same species. (Thompson, J. S. and M. W. Thompson, eds., Genetics in Medicine, W. B. Saunders Co., Philadelphia, Pa. (1991)).
Several methods for detecting DNA are currently being used. For example, nucleic acid sequences can be identified by comparing the mobility of an amplified nucleic acid fragment with a known standard by gel electrophoresis, or by hybridization with a probe, which is complementary to the sequence to be identified. Identification, however, can only be accomplished if the nucleic acid fragment is labeled with a sensitive reporter function (e.g. radioactive (32P, 35S), fluorescent or chemiluminescent). However, radioactive labels can be hazardous and the signals they produce decay over time. Non-isotopic labels (e.g. fluorescent) suffer from a lack of sensitivity and fading of the signal when high intensity lasers are being used. Additionally, performing labeling, electrophoresis and subsequent detection are laborious, time-consuming and error-prone procedures. Electrophoresis is particularly error-prone, since the size or the molecular weight of the nucleic acid cannot be directly correlated to the mobility in the gel matrix. It is known that sequence specific effects, secondary structures and interactions with the gel matrix are causing artefacts.
In general, mass spectrometry provides a means of xe2x80x9cweighingxe2x80x9d individual molecules by ionizing the molecules in vacuo and making them xe2x80x9cflyxe2x80x9d by volatilization. Under the influence of combinations of electric and magnetic fields, the ions follow trajectories depending on their individual mass (m) and charge (z). In the range of molecules with low molecular weight, mass spectrometry has long been part of the routine physical-organic repertoire for analysis and characterization of organic molecules by the determination of the mass of the parent molecular ion. In addition, by arranging collisions of this parent molecular ion with other particles (e.g., argon atoms), the molecular ion is fragmented forming secondary ions by the so-called collision induced dissociation (CID). The fragmentation pattern/pathway very often allows the derivation of detailed structural information. Many applications of mass spectrometric methods are known in the art, particularly in biosciences, and can be found summarized in Methods in Enzymology, Vol. 193: xe2x80x9cMass Spectrometryxe2x80x9d (J. A. McCloskey, editor), 1990, Academic Press, New York.
Due to the apparent analytical advantages of mass spectrometry in providing high detection sensitivity, accuracy of mass measurements, detailed structural information by CID in conjunction with an MS/MS configuration and speed, as well as on-line data transfer to a computer, there has been considerable interest in the use of mass spectrometry for the structural analysis of nucleic acids. Recent reviews summarizing this field include K. H. Schram, xe2x80x9cMass Spectrometry of Nucleic Acid Components, Biomedical Applications of Mass Spectrometryxe2x80x9d 34, 203-287 (1990); and P. F. Crain, xe2x80x9cMass Spectrometric Techniques in Nucleic Acid Research,xe2x80x9d Mass Spectrometry Reviews 9, 505-554 (1990).
However, nucleic acids are very polar biopolymers that are very difficult to volatilize. Consequently, mass spectrometric detection has been limited to low molecular weight synthetic oligonucleotides by determining the mass of the parent molecular ion and through this, confirming the already known oligonucleotide sequence, or alternatively, confirming the known sequence through the generation of secondary ions (fragment ions) via CID in an MS/MS configuration utilizing, in particular, for the ionization and volatilization, the method of fast atomic bombardment (FAB mass spectrometry) or plasma desorption (PD mass spectrometry). As an example, the application of FAB to the analysis of protected dimeric blocks for chemical synthesis of oligodeoxynucleotides has been described (Wolter et al. Biomedical Environmental Mass Spectrometry 14, 111-116 (1987)).
Two more recent ionization/desorption techniques are electrospray/ionspray (ES) and matrix-assisted laser desorption/ionization (MALDI). ES mass spectrometry has been introduced by Yamashita et al. (J. Phys. Chem. 88, 4451-59 (1984); PCT Application No. WO 90/14148) and current applications are summarized in recent review articles (R. D. Smith et al., Anal. Chem. 62, 882-89 (1990) and B. Ardrey, Electrospray Mass Spectrometry, Spectroscopy Europe, 4, 10-18 (1992)). The molecular weights of a tetradecanucleotide (Covey et al. xe2x80x9cThe Determination of Protein, Oligonucleotide and Peptide Molecular Weights by Ionspray Mass Spectrometry,xe2x80x9d Rapid Communications in Mass Spectrometry, 2, 249-256 (1988)), and of a 21-mer (Methods in Enzymology, 193, xe2x80x9cMass Spectrometryxe2x80x9d (McCloskey, editor), p. 425, 1990, Academic Press, New York) have been published. As a mass analyzer, a quadrupole is most frequently used. The determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks which all could be used for the mass calculation.
MALDI mass spectrometry, in contrast, can be particularly attractive when a time-of-flight (TOF) configuration is used as a mass analyzer. The MALDI-TOF mass spectrometry has been introduced by Hillenkamp et al. (xe2x80x9cMatrix Assisted UV-Laser Desorption/Ionization: A New Approach to Mass Spectrometry of Large Biomolecules,xe2x80x9d Biological Mass Spectrometry (Burlingame and McCloskey, editors), Elsevier Science Publishers, Amsterdam, pp. 49-60, 1990.) Since, in most cases, no multiple molecular ion peaks are produced with this technique, the mass spectra, in principle, look simpler compared to ES mass spectrometry.
Although DNA molecules up to a molecular weight of 410,000 daltons have been desorbed and volatilized (Williams et al., xe2x80x9cVolatilization of High Molecular Weight DNA by Pulsed Laser Ablation of Frozen Aqueous Solutions,xe2x80x9d Science, 246, 1585-87 (1989)), this technique has so far only shown very low resolution (oligothymidylic acids up to 18 nucleotides, Huth-Fehre et al., Rapid Communications in Mass Spectrometry, 6, 209-13 (1992); DNA fragments up to 500 nucleotides in length K. Tang et al., Rapid Communications in Mass Spectrometry, 8, 727-730 (1994); and a double-stranded DNA of 28 base pairs (Williams et al., xe2x80x9cTime-of-Flight Mass Spectrometry of Nucleic Acids by Laser Ablation and Ionization from a Frozen Aqueous Matrix,xe2x80x9d Rapid Communications in Mass Spectrometry, 4, 348-351 (1990)).
Japanese Patent No. 59-131909 describes an instrument, which detects nucleic acid fragments separated either by electrophoresis, liquid chromatography or high speed gel filtration. Mass spectrometric detection is achieved by incorporating into the nucleic acids, atoms which normally do not occur in DNA such as S, Br, I or Ag, Au, Pt, Os, Hg.
The instant invention provides mass spectrometric processes for detecting a particular nucleic acid sequence in a biological sample. Depending on the sequence to be detected, the processes can be used, for example, to diagnose (e.g. prenatally or postnatally) a genetic disease or chromosomal abnormality; a predisposition to or an early indication of a gene influenced disease or condition (e.g. obesity, artherosclerosis, cancer), an infection by a pathogenic organism (e.g. virus, bacteria, parasite or fungus); or to provide information relating to identity (e.g., mini- and micro-satellites) heredity, or compatibility (e.g. HLA phenotyping).
In a first embodiment, a nucleic acid molecule containing the nucleic acid sequence to be detected (i.e. the target) is initially immobilized to a solid support. Immobilization can be accomplished, for example, based on hybridization between a portion of the target nucleic acid molecule, which is distinct from the target detection site and a capture nucleic acid molecule, which has been previously immobilized to a solid support. Alternatively, immobilization can be accomplished by direct bonding of the target nucleic acid molecule and the solid support. Preferably, there is a spacer (e.g. a nucleic acid molecule) between the target nucleic acid molecule and the support. A detector nucleic acid molecule (e.g. an oligonucleotide or oligonucleotide mimetic), which is complementary to the target detection site can then be contacted with the target detection site and formation of a duplex, indicating the presence of the target detection site can be detected by mass spectrometry. In preferred embodiments, the target detection site is amplified prior to detection and the nucleic acid molecules are conditioned. In a further preferred embodiment, the target detection sequences are arranged in a format that allows multiple simultaneous detections (multiplexing), as well as parallel processing using oligonucleotide arrays (xe2x80x9cDNA chipsxe2x80x9d).
In a second embodiment, immobilization of the target nucleic acid molecule is an optional rather than a required step. Instead, once a nucleic acid molecule has been obtain from a biological sample, the target detection sequence is amplified and directly detected by mass spectrometry. In preferred embodiments, the target detection site and/or the detector oligonucleotides are conditioned prior to mass spectrometric detection. In another preferred embodiment, the amplified target detection sites are arranged in a format that allows multiple simultaneous detections (multiplexing), as well as parallel processing using oligonucleotide arrays (xe2x80x9cDNA chipsxe2x80x9d).
In a third embodiment, nucleic acid molecules which have been replicated from a nucleic acid molecule obtained from a biological sample can be specifically digested using one or more nucleases (using deoxyribonucleases for DNA or ribonucleases for RNA) and the fragments captured on a solid support carrying the corresponding complementary sequences. Hybridization events and the actual molecular weights of the captured target sequences provide information on whether and where mutations in the gene are present. The array can be analyzed spot by spot using mass spectrometry. Further, the fragments generated can be ordered to provide the sequence of the larger target fragment.
Examples of preferred methods for generating specifically terminated fragments include: 1) using a base-specific ribonuclease (e.g. the G-specific T1, the A-specific U2, the A/U specific PhyM and U/C specific ribonuclease A) e.g., after a transcription reaction; 2) performing a combined amplification and base-specific termination reaction (e.g. using two appropriate polymerases); and 3) contacting an appropriate amount of the target nucleic acid with a specific endonuclease (e.g., a restriction enzyme).
In preferred embodiments, the 5xe2x80x2 and/ or 3xe2x80x2 end of the target nucleic acid is tagged to facilitate the ordering of fragments. Tagging of the 3xe2x80x2 end is also useful to rule out or compensate for the influence of 3xe2x80x2 heterogeneity, premature termination and nonspecific elongation. In other preferred embodiments, modified nucleotides are included in the transcription reaction with unmodified nucleotides. Most preferably, the modified nucleotides and unmodified nucleotides are added to the transcription reaction at appropriate concentrations, so that both moieties are incorporated at a preferential rate of about 1:1. Alternatively, two separate transcriptions of the target DNA sequence, one with the modified and one with the unmodified nucleotides can be performed and the results compared.
In a fourth embodiment, at least one primer with 3xe2x80x2 terminal base complementarity to an allele (mutant or normal) is hybridized with a target nucleic acid molecule, which contains the allele. An appropriate polymerase and a complete set of nucleoside triphosphates or only one of the nucleoside triphosphates are used in separate reactions to furnish a distinct extension of the primer. Only if the primer is appropriately annealed (i.e. no 3xe2x80x2 mismatch) and if the correct (i.e. complementary) nucleotide is added, will the primer be extended. Products can be resolved by molecular weight shifts as determined by mass spectrometry.
In a fifth embodiment, a nucleic acid molecule containing the nucleic acid sequence to be detected (i.e. the target) is initially immobilized to a solid support.
Immobilization can be accomplished, for example, based on hybridization between a portion of the target nucleic acid molecule, which is distinct from the target detection site and a capture nucleic acid molecule, which has been previously immobilized to a solid support. Alternatively, immobilization can be accomplished by direct bonding of the target nucleic acid molecule and the solid support. Preferably, there is a spacer (e.g. a nucleic acid molecule) between the target nucleic acid molecule and the support. A nucleic acid molecule that is complementary to a portion of the target detection site that is immediately 5xe2x80x2 of the site of a potential mutation (X) is then hybridized with the target nucleic acid molecule. The addition of a complete set of dideoxynucleosides or 3xe2x80x2-deoxynucleoside triphosphates (e.g. pppAdd, pppTdd, pppCdd and pppGdd) and a DNA dependent DNA or RNA polymerase allows for the addition only of the one dideoxynucleoside or 3xe2x80x2-deoxynucleoside triphosphate that is complementary to X. The hybridization product can then be detected by mass spectrometry.
In a sixth embodiment, a target nucleic acid is hybridized with a complementary oligonucleotides that hybridize to the target within a region that includes a mutation M. The heteroduplex is then contacted with an agent that can specifically cleave at an unhybridized portion (e.g. a single strand specific endonuclease), so that a mismatch, indicating the presence of a mutation, results in the cleavage of the target nucleic acid. The two cleavage products can then be detected by mass spectrometry.
In a seventh embodiment, which is based on the ligase chain reaction (LCR), a target nucleic acid is hybridized with a set of ligation educts and a thermostable DNA ligase, so that the ligation educts become covalently linked to each other, forming a ligation product. The ligation product can then be detected by mass spectrometry and compared to a known value. If the reaction is performed in a cyclic manner, the ligation product obtained can be amplified to better facilitate detection of small amounts of the target nucleic acid. Selection between wildtype and mutated primers at the ligation point can result in the detection of a point mutation.
In an eighth embodiment, at least one primer with 3xe2x80x2-terminal base is hybridized to the target nucleic acid near a site where possible mutations are to be detected. An appropriate polymerase and a set of three nucleoside triphosphates (NTPs) and the fourth added as a terminator are reacted. The extension reaction products are measured by mass spectrometry and are indicative of the presence and the nature of a mutation. The set of three NTPs and one dd NTP (or three NTPs and one 3xe2x80x2-deoxy NTP), will be varied to be able to discriminate between several mutations (including compound heterozygotes) in the target nucleic acid sequnce.
The processes of the invention provide for increased accuracy and reliability of nucleic acid detection by mass spectrometry. In addition, the processes allow for rigorous controls to prevent false negative or positive results. The processes of the invention avoid electrophoretic steps; labeling and subsequent detection of a label. In fact it is estimated that the entire procedure, including nucleic acid isolation, amplification, and mass spectrometry analysis requires only about 2-3 hours. Therefore the instant disclosed processes of the invention are faster and less expensive to perform than existing DNA detection systems. In addition, because the instant disclosed processes allow the nucleic acid fragments to be identified and detected at the same time by their specific molecular weights (an unambiguous physical standard), the disclosed processes are also much more accurate and reliable than currently available procedures.