I. Field of the Invention
The present invention relates generally to the fields of high throughput genetic analysis applications and fluorescence spectroscopy. More particularly, it provides a variety of compositions and methods for use in high-throughput DNA sequence identification.
II. Description of Related Art
The Human Genome Project (HGP) holds tremendous promise for discoveries of the molecular mechanisms that trigger the onset of many common diseases over the next several decades. The initial HGP goals underway will provide or have provided the complete and accurate genome sequences of human and multiple well-studied genetic model organisms, such as mouse, rat, fruit fly, nematode, yeast and numerous bacteria. From this foundation of reference genome sequences, the elucidation of complete gene sets, coupled with comparative cross-species studies, are expected to assist significantly in the assignment to specific human genes of protein function and disease associations. Other technologies complement the assignment of biological functions: gene and protein expression profiling, mouse gene-knockouts, and techniques that measure protein-protein interactions. The elucidation of gene structure-protein function relationships are key to understanding how genomic sequence variation between individuals can cause increased risk or predisposition to certain complex diseases or are even the etiologic agents responsible for the onset of particular diseases. However, the use of genetic variation in clinical practice is only beginning and technology to facilitate its use is greatly needed.
The most commonly observed form of human sequence variation is single nucleotide polymorphisms (SNPs), which occur at a frequency of approximately 1-in-300 to 1-in-1000 base pairs. In general, 10%-to-15% of SNPs will affect either protein function by altering specific amino acid residues, or will affect the proper processing of genes by changing splicing mechanisms, or will affect the normal level of expression of the gene or protein by varying regulatory mechanisms. Several recent examples are the associations of mutations with the NOTCH4 gene and schizophrenia (Wei et al., 2000), peroxisome proliferator-activated receptor gamma (PPARγ) gene and severe insulin resistance (Deeb et al., 1998), and melanocortin-4 receptor (MC4R) gene and inherited obesity (Yeo et al., 1998).
The identification of informative SNPs will lead to more accurate diagnosis of inherited diseases, better assessment of risk susceptibilities, and could be assayed in specific tissue biopsies for sporadic mutations. An individual's SNP profile could be used to offset and significantly delay the progression of disease by helping in the choice of prophylactic drug therapies. A SNP profile of drug metabolizing genes could be used to prescribe a specific drug regimen to provide safer and more efficacious results. To accomplish goals like these, genome sequencing will move into the resequencing phase of not just a handful of individuals, but potentially the partial sequencing of most of the population. Resequencing simply means sequencing in parallel specific regions or single nucleotides that are distributed throughout the human genome to obtain the SNP profile for a given complex disease.
For this technology to be applicable and practicable for routine usage in medical practice, it must be robust, easy-to-use, highly sensitive, flexible, portable, and the results should be accurate and rapidly obtained. While current technologies at large genome centers are robust and results are accurate, they are inadequate and inflexible for resequencing millions of individuals in routine clinical practice. It is therefore advantageous to develop a DNA sequencing instrument, which meets these needs. Miniaturization of this technology is also advantageous because smaller instruments potentially require less sample and reagents and can be more readily transported and located in areas such as clinics or doctors' offices.
Ideally, DNA sequencing technology would have the sensitivity for direct assays without DNA amplification, and be simple and portable for routine usage in basic, applied, and clinical laboratories. Currently, DNA sequencing technology for high-throughput analyses are specialized and centralized in large genome centers and require numerous molecular biology manipulations that take days or weeks of preparation before DNA sequence analysis can be performed. Thereafter, the state-of-the-art technology involves the attachment of four different fluorescent dyes or fluorophores to the four bases of DNA (i.e., A, C, G, and T) that can be discriminated by their respective emission wavelengths, the electrophoretic separation of the nested set of dye-labeled DNA fragments into base-pair increments, and the detection of the dye fluorescence following irradiation by a single argon-ion laser source. Current instrumentation for electrophoretic separation comprises a 96-capillary array that disperses the different fluorescent signals using a prism, diffraction grating, spectrograph, or other dispersing element and images the four colors onto a charged-coupled device (CCD) camera. The throughput of each 96-capillary instrument is approximately 800 DNA samples per day, and the success of the HGP in large-scale genomic sequencing has been attributed to the use of hundreds of these machines throughout the world. The main disadvantages of the current technology are the laborious cloning or amplification steps needed to provide sufficient DNA material for analyses, the relatively large size of the instruments (roughly the size of a 4-foot refrigerator), and the inadequate sensitivity of detection (i.e., inefficient excitation of fluorescent dyes with absorption maxima far from the laser excitation wavelength).
Although the resolution of spectral emission wavelengths is the mainstream technology used in commercial and academic prototype instruments, several groups have explored other physical properties of fluorescence as a method for discriminating multicolor systems for DNA sequence determination. Recently, Lieberwirth et al. (1998) described a diode-laser based time-resolved fluorescence confocal detection system for DNA sequencing by capillary electrophoresis. In this system, a semiconductor laser (630 nm) was modulated using a tunable pulse generator at a repetition rate of 22 MHz (454 psec pulses) and focused by a microscope objective. The fluorescence was collected by the same objective and imaged on a single photon counting module APD (Lieberwirth et al., 1998).
The Luryi group at SUNY Stony Brook have proposed a multiple laser excitation approach using different radio frequency (RF) modulations and demodulations to discriminate a mixture of fluorophores (U.S. Pat. Nos. 5,784,157 and 6,038,023). U.S. Pat. No. 5,784,157 describes a 4-laser based fiber optic single capillary monitoring device, which initially has a non-wavelength component, but later the invention discusses the coupling of spectral resolution for fluorophore discrimination. There are three significant flaws apparent in this system relating to the enhanced fluorescence cross-talk and laser scattered light, low sensitivity detection, and a system that does not appear to scale beyond one capillary.
As described, the target capillary is illuminated simultaneously by all four lasers, which are modulated by different RF signals. The different RF signals for all of the dyes are summed together and the detector photodiodes are demodulated by additional heterodyne RF signals. Interestingly, Gorfinkel and Luryi describe the creation of Bragg reflectors to eliminate cross-talk modulation for a given dye set. Fluorescence cross-talk, however, will not be eliminated using this technique. Signal from the “wrong” dye, which is weakly excited off-resonance by a particular laser, will be encoded with the corresponding “wrong” frequency, decoded, and added to the signal for the target dye. Moreover, scattered laser light will also be modulated, and is likewise not rejected by the heterodyne detection.
The simultaneous multi-modulation method also has a serious shortcoming for the detection of low light levels, which is a specific aim of the current invention. All the lasers are proposed to operate simultaneously, followed by detection of substantially all of the entire fluorescence, and conversion of the collected fluorescence to an electrical signal. This design potentially creates a correspondingly high quantum statistical noise level, which should be distributed to all the detectors. The demultiplexing process of RFs does not remove this excessive random noise, even if the corresponding signal is small (Meaburn, 1976). In comparison, the Pulse-Multiline Excitation (PME) system described in the current invention exhibits noise levels in proper proportion, so that a weak signal originating from a particular laser pulse has a correspondingly low detected noise level during that laser's sub-cycle. Optimizing the optical system for producing low noise levels is essential in establishing the optimum contrast between the presence and absence of a given dye.
Finally, U.S. Pat. No. 5,784,157 describes a rather complicated array of optical fibers, combiners, splitters, and 4 heterodyne detectors with their associated spectral filters for a single capillary channel. Scaling this system to a 2-capillary system would entail doubling the mentioned detector components. Unfortunately a CCD camera is not readily adapted for high frequency RF modulation, as it is an “inherently discrete-time” device. In a more recent document, U.S. Pat. No. 6,038,023, the multiplicity of spectral filters has been replaced with a dispersing prism spectrometer and a high speed one dimensional array detector for use with a single capillary channel device; the potential to scale up to a capillary array system is more feasible as discussed by the Luryi group, but may require a multiplicity of such spectrometer units.
The current invention comprises a novel fluorescence device, which is capable of significant improvements in the limit of detection of multi-color fluorescence reactions and may be applied to direct measurement of such reactions from biological sources (i.e., without the need for PCR or cloning amplifications). Moreover, this technology, called Pulse-Multiline Excitation or “PME” can be configured on a small work surface or in a small instrument, compared to the current DNA sequencing instruments. Thus, a DNA sequencer the size of a suitcase or smaller is described.
The development of improved DNA sequencing chemistries will likely improve the number of independent assays that can be run in parallel. This technology will have broad application in both general sequencing and forensic applications.