The present invention relates generally to comparing DNA sequences and, more particularly, to the use of mass spectrometry for the determination of base (nucleotide) compositions of DNA fragments. This invention was made with government support under Contract No. W-7405-ENG-36 awarded by the US Department of Energy to The Regents of The University of California. The government has certain rights in this invention.
As the emphasis of the Human Genome Project shifts toward large-scale sequence analysis of the human genome, efficient quality control (QC) measures and rapid identification of genetic variations become increasingly important in sequencing process control, genotyping and clinical diagnosis. Analysis of a large amount of genomic sequence data requires techniques that are fast, accurate, cost-effective, and easily automated. These analyses can be efficiently accomplished by comparison of nucleotide compositions of samples from various related sources. The applications of new accurate methods to determine the nucleotide composition of oligonucleotides will avoid the intensive labor and cost involved in DNA sequencing. In the absence of a redetermination of the sequence of the target DNA region, an accurate determination of the nucleotide composition of a DNA fragment is useful for: (i) verifying the accuracy of a previously determined DNA sequence; (ii) providing low-cost error checking and proofreading of a newly determined DNA sequence, and (iii) making an efficient comparison of a known DNA sequence with a related but previously undetermined DNA sequence. For example, a comparison of the newly determined nucleotide composition with a previously determined DNA sequence will indicate variation(s) in the number of a particular type of nucleotide which implies the nature (base type) and numbers of errors in these regions. Genetic variations can also be revealed by changes in the nucleotide composition between wild-type and mutant genes. The comparison of nucleotide compositions can easily be extended to score the known single nucleotide polymorphisms (SNPs), the most common genetic variation in the eukaryotic genome.
In the past few years, mass spectrometry (MS) has emerged as a powerful alternative to the techniques of gel electrophoresis for DNA sequencing and diagnosis. Mass spectrometers produce a direct mass measurement, whereas gel electrophoresis separates ions according to their mobilities which are correlated with ion masses and charges. Electrophoresis typically takes hours, but mass spectra can be acquired in seconds or minutes in the femtomolar to picomolar range. Recently, matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) MS has been successfully used for fast DNA sequencing and the efficient size determination of DNA molecules. However, in spite of its high accuracy and speed, DNA sequencing by MALDI-TOF MS has an upper size range of 63-100 base pairs (bp) which is much lower than the limit of 500-1000 bp for gel electrophoresis. Coupled with Sanger sequencing reactions (See, e.g., xe2x80x9cSequencing DNA Using Mass Spectrometry For Ladder Detectionxe2x80x9d by N. I. Taranenko et al., Nucleic Acids Research 26, 2488 (1998), MALDI-TOF MS is employed to determine the molecular weights of Sanger ladders. Signals due to false stops, fragmentations, and unidentified peaks appear increasingly with larger DNA molecules and severely complicate the sequence assignments. By contrast, the advent of MALDI-TOF MS has made it easier to ionize intact large DNA molecules and measure their mass-to-charge ratios. Single-stranded and double-stranded polymerase chain reaction (PCR) products of 500 nucleotide (nt) in length have been detected by MALDI-TOF MS. Most recently, with optimized matrix-laser combinations that reduce DNA fragmentation, infrared MALDI mass spectra of synthetic DNA, restriction enzyme fragments of plasmid DNA, and RNA transcripts up to a size of 2180 nt have been reported with an accuracy of xc2x10.5-1%. Although large oligomers have been detected by MALDI-TOF MS, it is generally accepted that up to a 100 mer is routine at present. However, the potential application of MALDI-TOF MS in the determination of molecular weights of intact DNA molecules has yet to be fully explored. For a review of electrospray ionization (ESI) and MALDI-TOP MS, see xe2x80x9cApplications Of Mass Spectrometry To The Characterization Of Oligonucleotides And Nucleic Acidsxe2x80x9d by Pamela F. Crain and James A. McCloskey, Current Opinion in Biotechnology 9, 25 (1998). Except for DNA sequencing, there is at present no technique capable of directly relating the molecular weight of a DNA molecule to its base composition.
In natural abundance, biological molecules are composed of over 99% of the isotopes, 12C, 14N, and 1H, the isotopes 13C, 15N, and 2H (D) comprising less than 1% of the mass of such molecules. Thus, the molecular weights of biological molecules are effectively the sum of the masses of most abundant isotopes. Stable isotope enrichment or labeling results in molecules where the amounts of less abundant atomic isotope(s) are increased to artificial levels. Changing the isotopic content in a DNA molecule results in a change in the mass of the molecule without substantially altering any of its chemical or physical properties, such as charge, sequence, length or structure. If a specific type of dNTP is labeled, xe2x80x9cmass tagsxe2x80x9d for these species can be introduced into oligonucleotides prepared by PCR or by rolling-circle amplification, and provide characteristic signatures for the labeled nucleotides in the resulting oligonucleotides. See, e.g., xe2x80x9cMutation Detection And Single-Molecule Counting Using Isothermal Rolling-Circle Amplificationxe2x80x9d by Paul M. Lizardi et al., Nature Genetics 19, 225 (1998). The most common labeling approach is to enrich the isotopic levels of 13C for carbon, 15N for nitrogen, and 2H (D) for hydrogen atoms, since these isotopes are readily taken up by living species by cell growth.
After completion of the human genome project, there will bean even greater need for routinely comparing small segments of the genome to the reference genome. These comparisons of base (nucleotide) compositions will often be no more than 80-100 bp-long DNA fragments.
Accordingly, it is an object of the present invention to determine the base (nucleotide) composition of oligonucleotides using mass spectrometry.
Another object of the invention is to determine the base (nucleotide) composition of oligonucleotides without having to sequence the oligonucleotide.
Additional objects, advantages, and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examinations of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
To achieve the forgoing and other objects, and in accordance with the purposes of the present invention as embodied and broadly described herein, the method for determining the base (nucleotide) composition of an oligonucleotide hereof includes: incorporating a stable, isotope-labeled form of one of the four nucleotide units of an oligonucleotide into the oligonucleotide under investigation in place of the ordinary nucleotide therein, the other three types of nucleotides in the oligonucleotide being unlabeled; measuring the mass peak of the unlabeled oligonucleotide using mass spectrometry; measuring the mass peak of the labeled oligonucleotide using mass spectrometry; obtaining the magnitude of the mass shift between the labeled oligonucleotide and the unlabeled oligonucleotide, whereby the number of isotope-labeled nucleotides in the oligonucleotide under investigation is determined, and comparing the number of isotope-labeled nucleotides with the number of that type of nucleotide in a reference oligonucleotide.
Preferably, the step of incorporating the stable, isotope-labeled nucleotide into the oligonucleotide under investigation is achieved by polymerase chain reaction (PCR) amplification of the oligonucleotide using isotope-labeled dNTP corresponding to the isotope-labeled nucleotide.
It is also preferred that PCR primers are chosen which contain a sequence for the type IIS restriction enzyme.
Preferably also the steps of measuring the mass peak of the unlabeled oligonucleotide using mass spectrometry and measuring the mass peak of the labeled oligonucleotide using mass spectrometry are achieved using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry or electrospray ionization (ESI) mass spectrometry.
In another aspect of the present invention in accordance with its objects and purposes, the method for determining the nucleotide composition of an oligonucleotide hereof includes: incorporating a stable, isotope-labeled form of two of the four nucleotide units of an oligonucleotide into the oligonucleotide to be determined in place of the ordinary nucleotides therein, each nucleotide having a distinct mass, the other two types of nucleotides in the oligonucleotide being unlabeled; measuring, the mass peak of the unlabeled oligonucleotide using mass spectrometry; measuring the mass peak of the labeled oligonucleotide using mass spectrometry; comparing the magnitude of the mass shift between the labeled oligonucleotide and the unlabeled oligonucleotide, whereby the number of each of the isotope-labeled nucleotides in the oligonucleotide under investigation is determined, and comparing the number of isotope-labeled nucleotides with the number of that type of nucleotide in a reference oligonucleotide.
Preferably, the step of incorporating two stable, isotope-labeled nucleotides into the oligonucleotide under investigation is achieved PCR amplification of the oligonucleotide using an isotope-labeled dNTP corresponding to each of the isotope-labeled nucleotides.
It is preferred that PCR primers are chosen which contain a sequence for the type IIS restriction enzyme.
Preferably also, the steps of measuring the mass peak of the unlabeled oligonucleotide using mass spectrometry and measuring the mass peak of the labeled oligonucleotide using mass spectrometry are achieved using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry or electrospray ionization mass spectrometry.
Benefits and advantages of the present invention include the determination of the base (nucleotide) composition in oligonucleotides without using gel electrophoresis with, radioactive isotope or fluorescent labeling.