This specification includes a microfiche appendix containing a listing of the computer programs of this invention, this appendix comprising 2 microfiche of 173 total frames.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document of the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to a method and apparatus for analysis of biopolymers by the electrophoretic separation of biopolymer fragments. More particularly, it relates to a method and apparatus for automated, high-capacity, concurrent analysis of multiple DNA samples.
Molecular biology research depends on biopolymer analysis. Conventionally, for this analysis, a biopolymer sample is first fragmented into shorter length biopolymer fragments by enzymatic or chemical means. The fragments are distinctively labeled with detection labels and then separated, often electrophoretically. The fragment pattern is then detected to obtain information about the structure and nature of the original biopolymer sample. These steps are typically performed separately with human intervention required to transfer the sample from one step to another.
A well known example of biopolymer analysis is DNA sequencing. See F. Sanger, et. al., DNA Sequencing with Chain Terminating Inhibitors, 74 Proc. Nat. Acad. Sci. USA 5463 (1977); Lloyd M. Smith, et. al., Fluorescence detection in automated DNA sequence analysis, 321 Nature 674 (1986); Lloyd M. Smith, The Future of DNA Sequencing, 262 Science 530 (1993), which are incorporated herein by reference. A prevalent sequencing method comprises the following steps. A DNA sample is first amplified, that is the DNA chains are made to identically replicate, usually by the polymerase chain reaction (PCR). From the amplified sample, nested sets of DNA fragments are produced by chain terminating polymerase reactions (Sanger reactions). Each chain fragment is labeled with one of four fluorescent dyes according to the chain terminating base (either ddATP, ddCTP, ddGTP, or ddTTP). These fragments are then separated according to their molecular size by polyacrylamide gel electrophoresis and the unique dyes detected by their fluorescence. The DNA base sequence can be simply reconstructed from the detected pattern of chain fragments.
Electrophoresis is the separation of molecules by differential molecular migration in an electric field. For biopolymers, this is ordinarily performed in a polymeric gel, such as agarose or polyacrylamide, whereby separation of biopolymers with similar electric charge densities, such as DNA and RNA, ultimately is a function of molecular weight. The prevalent configuration is to have the gel disposed as a sheet between two flat, parallel, rectangular glass plates. An electric field is established along the long axis of the rectangular configuration, and molecular migration is arranged to occur simultaneously on several paths, or lanes, parallel to the electric field.
DNA sequence information is key to much modern genetics research. The Human Genome Project seeks to sequence the entire human genome of roughly three billion bases by 2006. This sequencing goal is roughly two orders of magnitude (factor of 100) beyond the total, current yearly worldwide DNA sequencing capacity. Sequencing of other biopolymers, for example RNA or proteins, is also crucial in other fields of biology. Other DNA fragment analysis techniques, such as PCR based diagnostics, genotyping (Ziegle, J. S. et al., Application of Automated DNA Sizing Technology for Genotyping Microsatellite Loci. Genomics, 14, 1026-1031 (1992)) and expression analysis are increasing in use and importance.
The need for methods to identify genes which are differentially expressed in specific diseases such as cancer is of paramount importance, for both the diagnosis of the disease and for therapeutic intervention. Identification of genes specifically expressed in different diseases will lead to better classification of these diseases with regard to their biological behavior. A molecular understanding of disease progression is fundamental to an understanding of a specific disease. The identification of molecular diagnostics that correlate with variations in disease state, growth potential, malignant transformation and prognosis will have tremendous implication in clinical practice, including the diagnosis and treatment of the disease.
No current method adequately or efficiently addresses the need to identify, isolate, and clone disease-specific genes. A new biopolymer fragment analysis method has been developed based on the use of arbitrarily primed PCR (Williams, J. G., Kubelik, A. R., Livak, K. J., Rafalski, J. A., and Tingey, S. V., DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res. 18, 6531-6535 (1990); Welsh, J. and McClelland M., Genomic fingerprinting using arbitrarily primed PCR and a matrix of pairwise combinations of primers. Nucleic Acids Res., 19, 5275-9 (1991)). When applied to mRNA, samples are first reverse transcribed into cDNA and then amplified with a combination of arbitrary and specific labelled primers (Froussard, P., A random-PCR method (rPCR) to construct whole cDNA library from low amounts of RNA. Nucleic Acids Res. 20, 2900 (1992); Welsh, J. et al., Arbitrarily primed PCR fingerprinting of RNA. Nucleic Acids Res., 20, 4965-70 (1992)). The resulting labeled DNA fragments are then electrophoresed through a gel producing a xe2x80x9cbanding patternxe2x80x9d or xe2x80x9cfingerprintxe2x80x9d of the mRNA source and run in separate gel lanes (Liang, P. and Pardee, A. B., Differential Display of Eukaryotic Messenger RNA by Means of the Polymerase Chain Reaction. Science, 257, 967-971 (1992)). Differences in gene expression are then found by manually comparing the fingerprints obtained from two mRNA sources. Following this, fragments of interest are extracted from the gel. This method is severely limited by its reliance on autoradiographic methods to allow for the isolation of the genes of interest. Refinements of PCR based techniques have, however, led to the ability to produce more reproducible banding patterns, and to the use of an automated DNA sequencing machine to record the banding patterns produced with fluorescently labeled primers (Liang, P., Averboukh, L. and Pardee A. B., Distribution and cloning of eukaryotic mRNAs by means of differential display: refinements and optimization. Nucleic Acids Res. 21, 3269-3275 (1993)). Further refinements in biological methods for expression analysis place ever-increasing demands on the quality of data that are obtained from, as well as the flexibility of, commercial instruments. For example, the fragment sizing resolution of the instruments must be high. Commercial automatic sequencing instruments (Applied Biosystems Inc., Foster City, Calif., DNA sequencer) do not allow for the spectral resolution of many dye labels or allow for the isolation of the fluorescently labeled samples after they are run. In an automated machine the sample is simply lost. Arbitrary primed PCR methods would be much more attractive if their limitations could be addressed. In general, commercial instruments do not provide the open architecture and design flexibility required to enable successful research in the rapidly evolving field of genomics.
To address these limitations, our invention allows these gene fragments to be detected fluorescently and to be directly isolated, without human intervention, as they are identified. This is accomplished by electrophoretically separating the individual bands, and hence the differentially expressed genes, from the rest of the sample as it is running. This approach incorporates the advantages of the PCR based methods to differential screening, while raising the level of speed, sensitivity and resolution well beyond that achievable with radiographic techniques. To insure high separation resolution, it is advantageous for the gel throughout a migration lane to be kept as uniform as possible and for the lanes to be sufficiently separated to be clearly distinguishable.
To achieve these required improvements in the analysis capacity for DNA and for other biopolymers, machines are needed for the rapid, concurrent analysis of large numbers of minute biopolymer samples. Further, the analysis must be done with minimal human intervention and at low cost. Since electrophoresis will remain the dominant biological separation technology for the foreseeable near future, the technical demands of more rapid electrophoresis will shape the design of such machines.
More rapid electrophoresis requires, primarily, higher voltages and stronger electric fields to exert greater forces on migrating molecules and move them at greater velocities. However, higher fields and velocities lead to increased resistive heating and consequent thermal gradients in the gel. Gel non-uniformities result, impairing separation resolution. To preserve resolution, ever smaller gel geometries must be used so that this damaging heat may be more readily conducted away. Moreover, parallel, narrow migration lanes are advantageous to increase the number of samples analyzed simultaneously. While electrophoresis has been described in geometries where the parallel glass plates are spaced from 25 to 150 xcexcm apart, instead of the usual 400 xcexcm, it is not possible to insure long, parallel, narrow, and closely spaced migration lanes in such a thin sheet. Alternatively, electrophoresis has been described in arrays of capillary tubes down to 25 xcexcm in diameter which completely define migration lanes. However, although the conventional plate arrangement is relatively easy to load with gel and samples, arrays of capillary tubes are much more difficult to load. Easy loading is advantageous to minimize analysis setup time and human intervention.
Indeed, from a human engineering perspective the business of high throughput electrophoresis is difficult. Small gel geometries require intense cleaning of parts. Chemical variability in the polymerization of separation media requires extensive quality control. Even assembly and operation of electrophoresis devices on a very large scale poses workflow challenges. An electrophoresis system that includes a disposable, recyclable or easily reusable electrophoresis module would provide significant potential improvement to overall system utility. Moreover, a manufacturing system for such electrophoresis modules that would provide cleaning and surface preparation quality control at a central facility rather than end-user site would provide throughput and uniformity advantages as well as cost reductions via scale economics.
The small geometries required by high resolution, high voltage electrophoretic analysis create additional technical demands. Where fluorescent dye fragment labeling is used, sensitive spectral detection devices are needed. These detection devices must respond quickly, since rapid migration presents fragment samples for detection with only slight time separation. Most significantly, rapid parallel analysis of many biopolymer samples requires the detection device to simultaneously detect fragments migrating in separate lanes while maintaining uniform spectral calibration across the entire imaging domain. Conventional detectors cannot meet these demands. One design uses rotatable filters to select spectral ranges to present to a single active detector element, this assembly being scanned mechanically across all the migration lanes. However, such mechanical single detector assemblies waste most of the available fluorescence energy from the fragment samples, limit detection speed, prohibit simultaneous detection, and slow sample analysis. Use of spectrally fixed filters also limits dynamic adaptation to different detection labels.
While a spatially compact disposition of the migration lanes might permit simultaneous observation, sample loading into the migration lanes prior to an analysis run requires physical access to the migration lanes. Access is easier and more rapid for widely spaced lanes. Furthermore, this access should be robust with respect to sample cross-contamination, and should be suitable for robotic and other types of mechanical integration. Conventional, flat-plate techniques have only straight, parallel lanes and cannot accommodate these divergent requirements.
A high throughput analysis machine would generate voluminous detection data representing the rapidly migrating biopolymer fragment samples. Manual analysis of such data is not feasible. To minimize human post analysis checking, these methods should achieve accuracies of 99% or greater. Further, the data would contain fragment detection events closely spaced, even overlapping, in time. Moreover, small electrophoretic geometries and small fragment sizes would generate only weak signals with increased noise. Prior electrophoretic devices, on the other hand, generated only clearly separated detection events with good signal intensities.
Once fragment events are discriminated, the entire data for a run must be assembled to determine the nature of the original biopolymer sample. For DNA sequencing, this is conventional: the bases and their order in the DNA sample are the terminating bases of the fragments in the order of increasing molecular weight. When sequencing on a genomic scale, the bases and their order must be assembled into an ordered listing of the bases of the genome of the organism being studied.
All the foregoing technical requirements have prevented creation of an integrated machine for rapid, concurrent generation and analysis of large numbers of biopolymer fragment samples. Indeed, no commercial instrument provides the end user with a system highly suited to very large scale DNA analysis. The need for such a machine is widely felt in such areas as biological research, for example the Human Genome Project, the biotechnology and genomics industries, and clinical diagnosis.
The apparatus and method of this invention have for their object the solution of these problems in electrophoretic biopolymer fragment analysis, and in particular, in DNA sequencing and gene expression analysis. In one aspect, the invention is an integrated, high capacity, low-cost machine for the automatic, concurrent analysis of numerous biopolymer fragment samples. More comprehensively, it is a manufacturing system for disposable, reusable or recyclable electrophoresis parts together with a collection of instruments comprising a high throughput DNA analysis facility. The system comprises a transmission imaging spectrograph with charge-coupled-detector detection and a microfabricated electrophoretic module. Among its objects are the provision of: easily loaded, simultaneously observable, electrophoretic geometries comprising multiple migration lanes each of the order of 100 xcexcm and down to 25 xcexcm or smaller; a spectral detection system which is capable of sensitive, simultaneous response to signals emitted by all the migration lanes and which is dynamically adaptable, without physical intervention, to different dyes, different numbers of dyes, and different coding of fragments with dyes; automatic generation of multiple biopolymer fragments directly on the analysis machine from crudely purified biopolymer samples and bulk reagents (for DNA, sequencing reactions would be automatically carried out); and an automatic data analysis method for transforming time-series of spectral signal to biopolymer sequences and which is adapted to the unique problems of discriminating overlapping and weak fragment recognition events while achieving 99% or greater recognition accuracies.
A high capacity analysis machine according to this invention includes elements for concurrent loading of multiple samples for analysis onto the machine, an electrophoretic module for actually performing the sample separation, a spectrometer capable of simultaneous spatial and spectral resolution and detection of light signals representative of sample fragments as they are separated by the electrophoretic module, and elements for converting the detected signals into the sequence and character of the biopolymer samples analyzed. Ideally, the electrophoresis module is easily reused, such as by handling multiple sample loadings and separations, or is disposable or recyclable.
Different sample loading techniques are used by different versions of this invention. One technique consists of simply loading small liquid volumes containing fragment samplesxe2x80x94manually or automaticallyxe2x80x94into wells in the electrophoretic medium. Preferably, access to the sample wells is obtained via holes or ports machined in the top part of the electrophoresis module. The holes are aligned to be in register with the migration lanes, and are sealed in such a way that each migration lane is in contact with exactly one port. This enables loading and analysis of samples with zero crosstalk between adjacent loading ports or migration lanes. Further, the samples can be loaded at different points along the migration path in staggered ports, generating offset data streams for adjacent migration lanes which are easily distinguishable even given nearly identical samples. Finally, the loading ports provide for easy mechanical integration, such as robotic loading, guidance of manual loading by syringe, or injection from the outlets of integrated reactor arrays. Most preferable is solid phase loading. Here a comb-like device has teeth which are sized and spaced to fit concurrently into all the sample wells in the electrophoretic medium. Each tooth carries a fragment sample attached by various denatureable binding methods. All the samples are released concurrently when the teeth are dipped into the sample wells. Advantageously, combs may have 50 to 100 teeth for concurrent loading of that number of samples. Notches or holes machined in the comb insertion region can aid the sample loading by aligning the comb with the sample wells. Regardless of how the samples are loaded, the DNA fragments can be collected at a low voltage focusing electrode prior to the electrophoretic separation, thereby increasing the intensity and resolution of the analysis signals detected.
Most preferable, especially for DNA sequencing, is a reactor array to generate fragment samples from crude DNA and to inject them onto the electrophoretic module. The reactor array comprises an array of micro-reactor chambers each with a minute inlet port and capillary inlet and outlet passages. The capillary passages are controlled by micro-machined valves. In one example a bubble, created by heating the capillary fluid, is used to control fluid flow through a capillary tube. The heating is by a resistive micro heating element formed by depositing a resistive thin film in the wall of the capillary. Leads are deposited to conduct current from an external controller to the heating element. To use this array, samples are introduced through the inlet ports; reagents are successively introduced through the capillary inlets; and fragment samples are ejected through the capillary outlets when reactions are complete. Reactions are facilitated by thermal control and heating elements located within each reactor.
Enabling the use of such a micro-reactor array for DNA sequencing is the use of dUTP rich PCR primers, a method of this invention. PCR amplification and Sanger sequencing can proceed sequentially without interference in one reactor by using the enzyme Uracil DNA Glycosylase (UDG). UDG digests dUTP rich PCR sequencing primers into fragments ineffective for initiating chain elongation in the subsequent Sanger sequencing reactions.
Also enabling the use of the microreactor array for DNA sequencing is the use of the enzymatic pretreatment of PCR products using a combination of Exonuclease I and shrimp alkaline phosphatase (United States Biochemicals, Cleveland, Ohio). The activity of both of these enzymes in PCR buffer eliminates the need for buffer exchanges. The Exonuclease I enzyme removes the residual PCR primers, while the shrimp alkaline phosphatase de-phosphorylates the dNTP""s inactivating them. The removal of both the primers and excess dNTP""s prevents them from interfering in the subsequent Sanger sequencing reactions.
Enabling the use of the microreactor array for other DNA fragment analysis methods including expression analysis, genotyping, forensics, and positional cloning is the direct incorporation of fluorescent labels onto the 5xe2x80x2 end of the original PCR primers. These primers can be either specific for known sequences, as in the case of genotyping or arbitrary as in the case of expression analysis. A series of different dyes can be used to allow the PCR amplification step to take place in a multiplex fashion within a single reactor.
Once the samples are loaded, separation occurs in the electrophoretic module. The invention is adaptable to use different such modules. One such module comprises rectangular plates spaced slightly apart to define a rectangular sheet of electrophoretic medium. Migration occurs in straight, parallel lanes through this medium. Another version uses ultra-thin plate spacing, down to 25 xcexcm, and high electrophoresis voltages, thereby achieving rapid fragment separation.
The preferred electrophoretic module is constructed using two plates with a photolithographically generated or other formation of channels formed between the plates. Numerous non-intersecting grooves etched or otherwise formed on the top plate, together with the bottom plate, define the migration lanes. The lanes are therefore separate non-communicating channels for holding separation medium, accessed via loading ports, each port accessing exactly one migration lane. Different groove-and migration lane geometries are possible. One geometry is straight, parallel lanes. The preferred geometry spaces lanes widely at the loading end of the module, to ease the physical aspects of loading, but converges the lanes closely at the detection end, to permit simultaneous detection of separated fragments in all lanes. Groove size may be as small as 25 xcexcm to allow high voltage rapid electrophoresis.
Different techniques can be used to form the grooves, including deposition of a material such as an adhesive or other polymer, or glass fibers or capillaries between the plates, serving to separate adjacent grooves. The grooves are preferably fabricated with standard photolithography techniques and, if necessary, subsequent etching and coating. Various combinations of substrates and processes are available, including patterning insulators on conductive surfaces, patterning polymers (for example, dry film resist) on insulating/conductive surfaces, or patterning conductors and coating with insulators. Alternatively, a master mold can be formed photolithographically, followed by duplication of the grooves in disposable substrates such as plastics (for example, polymethylmethacrylate) or glass via casting. In addition, different techniques can be used to bond the plates together, such as thermal annealing or the use of adhesives. Some combination of these techniques comprises a manufacturing system for manufacturing disposable, reusable, or replaceable electrophoresis module parts.
In all versions, the highest allowable electrophoretic voltages are used, where the maximum voltage is determined as that at which the mobility of biopolymer fragments is no longer sufficiently length dependent. Thermal control is achieved with a thermal control module in contact with the bottom plate sufficient to substantially eliminate thermal gradients. The preferred electrophoresis module provides especially good thermal control, since the small separation medium channels are in close contact on all sides with the top and bottom plates. The preferred thermal control module has a heat sink adapted to heat exchange with an air or water exchange fluid. Between the heat sink and the bottom plate of the electrophoretic module are bi-directional heat transfer devices. Preferably, these are Peltier thermo-electric modules disposed for pumping heat in both directions. Thereby, the bottom plate can be heated and cooled as needed and thermal gradients eliminated.
In one version, a transmission imaging spectrograph is used to detect separated fragments. The invention is particularly adapted to DNA sequence or other DNA analysis methods, in which each of the different fragment types is labelled with a different spectrally distinctive fluorescent dye. One or more lasers at the separation end of the electrophoresis module excites the dyes to emit light. Emitted light from samples in the migration lanes is incident on a collection lens. The light then passes first through a laser light filter, then through a transmission dispersion element, which spectrally separates the light, and finally through a focusing lens. The focusandd light is incident on a charge coupled device (CCD) array which detects the simultaneously spatially focused and spectrally diverged light from the detection regions of all the migration channels. Electronic signals from the CCD array provide information about the character or sequence of the DNA sample.
In the preferred version, a microfabricated set of components replaces the large scale imaging spectrograph. Here the function of the two camera lenses and diffraction grating is integrated within a single binary optic diffractive element. The diffractive element can be fabricated either on a glass surface, or on a separate material to be inserted between glass pieces.
The analysis system converts the electronic signals into biopolymer information which in one example is DNA base sequence. It comprises a standard programmable computer with short and long term memory and loaded with analysis programs particularly adapted to the preferred version of this invention. Interface devices place the electronic CCD output signals in the computer memory as binary signals. These signals are grouped both into spatial groups, one group for each migration lane, and into spectral groups, one group for each spectrally distinctive dye label. The grouped signals are filtered to minimize noise: high-pass filtering removes baseline low frequency noise, and low-pass filtering removes high-frequency single spike noise. If multiple samples are contained within a single migration lane, as enabled by the spectral multiplexing, the signals associated with each of the samples can be distinguished and grouped together using knowledge of the dyes associated with each of the samples.
The filtered signals are then compared to fragment recognition prototypes and the best prototype is chosen for each segment of filtered signals. The best prototype is that prototype whose averaged signal behavior for nearby times is closest to the observed signal behavior for the same nearby times. Closeness is simply measured by the ordinary distance between the observed signals and the prototypes. The base generating the input signals is identified as the base associated with the closest prototype. The sequence of closest prototypes thereby determines the DNA sequence and this sequence is output from the analysis system. In one embodiment, distances to each of the prototypes, or averages of the distances to the prototypes associated to the four possible bases, are also output. These values can be used to judge the confidence which one should assign to the DNA sequence, and in particular can be used to aid the comparison and assembly of multiple instances of the sequence of a given segment of DNA.
The prototypes are the averages of filtered signals generated in the apparatus of this invention from the analysis of known DNA. They are carefully chosen to be adapted to the characteristics of this invention. Preferably, they are chosen to include the signals generated by two sequential DNA fragments.
Further analysis is done in one embodiment of the invention. Any DNA sequences which are known (vector DNA) are trimmed out of the observed sequence. The remaining sequence is proofread by Monte Carlo simulated annealing. At random observation times a random alteration to the determined base sequence is made. The closeness between the entire resulting sequence and the entire filtered observed signal is evaluated. If a probabilistic test based on this closeness is met, the sequence alteration is retained; otherwise it is discarded. Alter and test activity is repeated until no further significant improvements occur. This step permits global improvements to be made in the overall sequence determined.