The present invention relates to methods for the use of a new class of dyes for DNA sequencing and labelling of DNA fragments for genetic analysis. The ability to determine the sequence of DNA is critical for understanding the function and control of genes and for applying many of the basic techniques of molecular biology. Native DNA consists of two linear polymers, or strands, of nucleotides. Each strand is a chain of nucleosides linked by phosphodiester bonds. The two strands are held together in an antiparallel orientation by hydrogen bonds between complementary bases of the nucleotides of the two strands: deoxyadenosine (A) pairs with thymidine (T) and deoxyguanosine (G) pairs with deoxycytidine (C).
The development of reliable methods for sequence analysis of DNA (deoxyribonucleic acid) and RNA (ribonucleic acid) has been essential to the success of recombinant DNA and genetic engineering. When used with the other techniques of modern molecular biology, nucleic acid sequencing allows dissection of animal, plant and viral genomes into discrete genes with defined chemical structure. Since the function of a biological molecule is determined by its structure, defining the structure of a gene is crucial to the eventual useful manipulation of this basic unit of hereditary information. Once genes can be isolated and characterized, they can be modified to produce desired changes in their structure that allow the production of gene products-proteins-with different properties than those possessed by the original gene products.
The development of modern nucleic acid sequencing methods involved parallel developments in a variety of techniques. One was the emergence of simple and reliable methods for cloning small to medium-sized strands of DNA into bacterial plasmids, bacteriophages, and small animal viruses. Cloning allowed the production of pure DNA in sufficient quantities to allow chemical analysis. Another was the use of gel electrophoretic methods for high resolution separation of oligonucleotides on the basis of size. The key development, however, was the introduction of methods of generating sets of fragments of cloned, purified DNA that contain, in their collection of lengths, the information necessary to define the sequence of the nucleotides comprising the parent DNA molecules.
Presently there are several approaches to DNA sequence determination, see, e.g., the dideoxy chain termination method, Sanger et al., Proc. Natl. Acad. Sci., 74:5463-67 (1977); the chemical degradation method, Maxam et al., Proc. Natl. Acad. Sci., 74:560-564 (1977); and hybridization methods, Drmanac et al., Genomics, 4:114-28 (1989), Khrapko, FEB 256:118-22 (1989). The chain termination method has been improved in several ways, and serves as the basis for all currently available automated DNA sequencing machines. See, e.g., Sanger et al., J. Mol. Biol., 143:161-78 (1980); Schreier et al.,J. Mol. Biol., 129:169-72 (1979); Smith et al., Nucleic Acids Research, 13:2399-2412 (1985); Smith et al., Nature, 321:674-79 (1987) and U.S. Pat. No. 5,171,534; Prober et al., Science, 238:336-41 (1987); Section II, Meth. Enzymol., 155:51-334 (1987); Church et al., Science, 240:185-88 (1988); Swerdlow and Gesteland, Nucleic Acids Research, 18:1415-19 (1989); Ruiz-Martinez et al., Anal. Chem., 2851-58 (1993); Studier, PNAS, 86:6917-21 (1989); Kieleczawa et al., Science, 258:1787-91; and Connell et al., Biotechniques, 5:342-348 (1987).
The method developed by Sanger is referred to as the dideoxy chain termination method. In a commonly-used variation of this method, a DNA segment is cloned into a single-stranded DNA phage such as M13. These phage DNAs can serve as templates for the primed synthesis of the complementary strand by conventional DNA polymerases. The primer is either a synthetic oligonucleotide or a restriction fragment isolated from the parental recombinant DNA that hybridizes specifically to a region of the M13 vector near the 3' end of the cloned insert. In each of four sequencing reactions, the primed synthesis is carried out in the presence of enough of the dideoxy analog of one of the four possible deoxynucleotides so that the growing chains are randomly terminated by the incorporation of these "deadend" nucleotides. The relative concentration of dideoxy to deoxy forms is adjusted to give a spread of termination events corresponding to all the possible chain lengths that can be resolved by gel electrophoresis. The products from each of the four primed synthesis reactions are loaded into individual lanes and are separated by polyacrylamide gel electrophoresis. Radioactive label incorporated in the growing chains are used to develop an autoradiogram image of the pattern of the DNA in each electrophoresis lane, The sequence of the deoxynucleotides in the cloned DNA is determined from an examination of the pattern of bands in the four lanes. Because the products from each of the four synthesis reactions must be run on separate gel lanes, there are problems with comparing band mobilities in the different lanes.
Turning to automated DNA sequencing machines, in general, fragments having different terminating bases can be labeled with different fluorescent dyes, which are attached either to a primer, e.g., Smith et al. (1987, cited above), or to the base of a terminal dideoxynucleotide, e.g., Prober et al. (cited above). A fluorescence detector then can be used to detect the fluorophore-labeled DNA fragments. The four different dideoxy-terminated samples can be run in four separate lanes or, if labeled differentially, in the same lane. The method of Fung, et al., U.S. Pat. No. 4,855,225, uses a set of four chromophores or fluorophore with different absorption or fluorescent maxima. Each of these tags is coupled chemically to the primer used to initiate the synthesis of the fragment strands. In turn, each tagged primer is then paired with one of the dideoxynucleotides and used in the primed synthesis reaction with conventional DNA polymerases. The labeled fragments are then combined and loaded onto the same gel column for electrophoretic separation. Base sequence is determined by analyzing the fluorescent signals emitted by the fragments as they pass a stationary detector during the separation process.
Obtaining a set of dyes to label the different fragments is a major difficulty in automated DNA sequencing systems. First, it is difficult to find three or more dyes that do not have emission bands which overlap significantly since the typical emission band halfwidth for organic fluorescent dyes is about 40-80 nanometers (nm) and the width of the visible spectrum is only about 350-400 nm. Second, even if dyes with non-overlapping emission bands are found, the set may still be unsuitable for DNA sequencing if the respective fluorescent efficiencies are too low. For example, Pringle et al., DNA Core Facilities Newsletter, 1:15-21 (1988), present data indicating that increased gel loading cannot compensate low fluorescent efficiencies.
Another difficulty with obtaining an appropriate set of dyes is that when several fluorescent dyes are used concurrently, excitation becomes difficult because the absorption bands of the dyes are often widely separated. The most efficient excitation occurs when each dye is illuminated at the wavelength corresponding to its absorption band maximum. Thus, one often is forced to compromise between the sensitivity of the detection system and the increased cost of providing separate excitation sources for each dye. In addition, when the number of differently sized fragments in a single column of a gel is greater than a few hundred, the physiochemical properties of the dyes and the means by which they are linked to the fragments become critical because the charge, molecular weight, and conformation of the dyes and linkers must not effect adversely the electrophoretic mobilities of closely-sized fragments. Changes in electrophoretic mobility can result in extensive band broadening or reversal of band positions on the gel, thereby destroying the correspondence between the order of bands and the order of the bases in the nucleic acid sequence. Due to the many problems associated with altered electrophoretic mobility, correction of mobility discrepancies by computer software is necessary in prior art systems. Finally, the fluorescent dyes must be compatible with the chemistry used to create or manipulate the fragments. For example, in the chain termination method the dyes used to label primers and/or the dideoxy chain terminators must not interfere with the activity of the polymerase or reverse transcriptase employed.
Because of these severe constraints only a few sets of fluorescent dyes have been found that can be used in DNA sequencing, particularly automated DNA sequencing, and in other diagnostic and analytical techniques, e.g., Smith et al. (1985, cited above); Prober et al. (cited above); Hood et al., European patent application 8500960; Bergot et al. (cited above); Fung et al. (cited above); Connell et al. (cited above); and Menchen et al., U.S. Pat. No. 5,188,934.
In view of the above, DNA sequencing would be advanced significantly by the availability of new sets of fluorescent dyes which (1) are physiochemically similar, (2) permit detection of spatially overlapping target substances, such as closely spaced bands of DNA on a gel, (3) extend the number of bases that can be determined on a single gel column by current methods of automated DNA sequencing, (4) are amenable for use with a wide range of preparative and manipulative techniques, and (5) otherwise satisfy the numerous requirements listed above. See, Bergot, et al. (cited above).
Until the present invention, one problem encountered was that each fluorophore altered the "normal" electrophoretic mobility of the corresponding termination products during gel electrophoresis such that software correction files were needed to generate accurate, evenly-spaced DNA sequences. See, Smith et al., Nature, 321:674-79 (1986) and U.S. Pat. No. 5,171,534. Thus, the set of discriminating fluorophore described in the literature is small, and the search for improved, alternative dyes has been difficult at best.
There are several different chemical modifications that have been attempted to correct for differences in gel mobility between different dye-labeled primers in automated DNA sequencing. Generally, fluorescein and its derivative dyes labeled in DNA sequencing reactions have different gel mobilities in comparison to rhodamine and its derivative dyes labeled in DNA sequencing reactions. Fluorescein and its derivative dye-labeled reactions typically move through the gel faster (sometimes greater than one base position) than rhodamine and its derivative dye-labeled reactions. For example, if using the -21M13 universal sequencing primer, each fluorophore is coupled to the primer via different linker arm lengths. Both fluoresceins are coupled to the primer using a two-carbon amino linker arm while both rhodamines are coupled to the primer using six-carbon amino linker arm. Mobility correction software, however, is required additionally to generate properly spaced DNA termination fragments. Another example involves custom sequencing primers. These primers refer to any oligonucleotide sequence that can act as a suitable DNA sequencing primer. To all custom sequencing primers, a 5'-leader sequence (5'-CAGGA) must be coupled to the primer and custom sequencing primers must use the M13RP1 mobility correction software to generate properly-spaced DNA termination fragments. The leader sequence is the first five bases of the reverse M13RP1 sequencing primer. M13RP1 is the mobility software file used to generate properly spaced DNA termination fragments for the reverse sequencing primer.
A new class of dyes, 4,4-difluoro-4-bora-3A,4A-diaza-s-indacene BODIPY.RTM. fluorophores, has been recently described. See, Haugland, et al., Molecular Probes: Handbook of Fluorescent Probes and Research Chemicals, pp. 24-32, and U.S. Pat. No. 4,774,339. The parent heterocyclic molecule of the BODIPY.RTM. fluorophore is a dipyrrometheneboron difluoride compound and which is modified to create a broad class of spectrally-discriminating fluorophore, see FIG. 1. The conventional naming of these dyes is BODIPY.RTM. followed by their approximate absorption/emission maxima, i.e., BODIPY.RTM. 530/550.
In addition to the specifically cited references above, additional prior art techniques include the following:
U.S. Pat. No. 4,318,846 to Khanna et al. is drawn to diether symmetrically-substituted fluoresceins having at least one anionic group and a linking functionality. Depending on the site of substitution, the compounds can be used as fluorescers absorbing at wavelengths in excess of 500 nm. The compounds can be used as labels in fluorescent immunoassays.
U.S. Pat. No. 4,811,218 to Hunkapiller et al. is drawn to a real-time, automated nucleic acid sequencing apparatus which permits more than one clone to be sequenced at the same time.
U.S. Pat. No. 4,855,225 to Fung et al., is drawn to a method for detecting up to four sets of oligonucleotides that have been differentially-labeled with fluorophore, two of the sets with substituted fluoresceins and two sets with substituted rhodamines, and separated by gel electrophoresis.
U.S. Pat. No. 5,366,860 to Bergot et al., is drawn to a method for detecting up to four sets of oligonucleotides that have been differentially-labeled with fluorophores, all rhodamines with different substitutions, and separated by gel electrophoresis.
U.S. Pat. No. 5,188,934 to Menchen, et al., is drawn to a method for detecting up to four sets of oligonucleotides that have been differentially-labeled with fluorophore, all fluoresceins with different substitutions, and separated by gel electrophoresis.
U.S. Pat. No. 5,171,534 to Smith et al. describes a system for the electrophoretic analysis of DNA fragments produced in DNA sequencing operations. The system comprises a source of chromophore or fluorescent tagged DNA fragments, a zone for contacting an electrophoresis gel, means for introducing said tagged DNA fragments to said zone and photometric means for monitoring the tagged DNA fragments as they move through the gel.
U.S. Pat. No. 5,366,603 is drawn to automatic DNA sequencing wherein the DNA is marked with near infrared fluorescent dyes.