The present invention is directed to optical systems, methods and products for analyzing polymers, and more particularly to optical systems, methods and products that utilize highly localized optical radiation for characterizing individual units of polymers.
Cells have a complex microstructure that determine the functionality of the cell. Much of the diversity associated with cellular structure and function is due to the ability of a cell to assemble various building blocks into diverse chemical compounds. The cell accomplishes this task by assembling polymers from a limited set of building blocks referred to as monomers or units. The key to the diverse functionality of polymers is based in the primary sequence of the monomers within the polymer and is integral to understanding the basis for cellular function, such as why a cell differentiates in a particular manner or how a cell will respond to treatment with a particular drug.
The ability to identify the structure of polymers identifying their sequence of monomers is integral to the understanding of each active component and the role that component plays within a cell. By determining the sequences of polymers it is possible to generate expression maps, to determine what proteins are expressed, to understand where mutations occur in a disease state, and to determine whether a polysaccharide has better function or loses function when a particular monomer is absent or mutated.
Expression maps relate to determining mRNA expression patterns. The need to identify differentially expressed mRNAs is critical in the understanding of genetic programming, both temporally and spatially. Different genes are turned on and off during the temporal course of an organisms"" life development, comprising embryonic, growth, and aging stages. In addition to developmental changes, there are also temporal changes in response to varying stimuli such as injury, drugs, foreign bodies, and stress. The ability to chart expression changes for specific sets of cells in time either in response to stimuli or in growth allows the generation of what are called temporal expression maps. On the other hand, there are also body expression maps, which include knowledge of differentially expressed genes for different tissues and cell types. Since generation of expression maps involve the sequencing and identification of CDNA or mRNA, more rapid sequencing necessarily means more rapid generation of multiple expression maps.
Currently, only 1% of the human genome and an even smaller amount of other genomes have been sequenced. In addition, only one very incomplete human body expression map using expressed sequence tags has been achieved (Adams et al., 1995). Current protocols for genomic sequencing are slow and involve laborious steps such as cloning, generation of genomic libraries, colony picking, and sequencing. The time to create even one partial genomic library is on the order of several months. Even after the establishment of libraries, there are time lags in the preparation of DNA for sequencing and the running of actual sequencing steps. Given the multiplicative effect of these unfavorable facts, it is evident that the sequencing of even one genome requires an enormous investment of money, time, and effort.
In general, DNA sequencing is performed using one of two methods. The first and more popular method is the dideoxy chain termination method described by Sanger et al. (xe2x80x9cDNA sequencing with chain-terminating inhibitors,xe2x80x9d Proc. Natl. Acad. Sci. USA. 74:5463-7, 1977). This method involves the enzymatic synthesis of DNA molecules terminating in dideoxynucleotides. By using the four ddNTPs, a population of molecules terminating at each position of the target DNA can be synthesized. Subsequent analysis yields information on the length of the DNA molecules and the base at which each molecule terminates (either A, C, G, or T). With this information, the DNA sequence can be determined. The second method is Maxam and Gilbert sequencing (Maxam and Gilbert, xe2x80x9cA new method for sequencing DNA,xe2x80x9d Proc. Natl. Acad. Sci. USA. 74:560-4, 1977), which uses chemical degradation to generate a population of molecules degraded at certain positions of the target DNA. With knowledge of the cleavage specificities of the chemical reactions and the lengths of the fragments, the DNA sequence is generated. Both methods rely on polyacrylamide gel electrophoresis and photographic visualization of the radioactive DNA fragments. Each process takes about 1-3 days. The Sanger sequencing reactions can only generate 300-800 bases in one run.
Sanger-based methods have been proposed to improve the output of sequence information. The Sanger-based methods include multiplex sequencing, capillary gel electrophoresis, and automated gel electrophoresis. Recently, there has also been increasing interest in developing Sanger independent methods as well. Sanger independent methods use a completely different methodology to realize the base information. This category contains the most novel techniques, which include scanning electron microscopy (STM), mass spectrometry, enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA) sequencing, exonuclease sequencing, and sequencing by hybridization.
Currently, automated gel electrophoresis is the most widely used method of large-scale sequencing. Automation requires reading of fluorescently labeled Sanger fragments in real time with a charge coupled device (CCD) detector. The four different dideoxy chain termination reactions are run with different labeled primers. The reaction mixtures are combined and co-electrophoresed down a slab of polyacrylamide. Using laser excitation at the end of the gel, the separated DNA fragments are resolved and the sequence determined by computer. Many automated machines are available commercially, each employing different detection methods and labeling schemes. The most efficient of these is the Applied Biosystems Model 377XL, which generates a maximum actual rate of 115,200 bases per day.
In the method of capillary gel-electrophoresis, reaction samples are analyzed by small diameter, gel-filled capillaries. The small diameter of the capillaries (50 xcexcm) allows for efficient dissipation of heat generated during electrophoresis. Thus, high field strengths can be used without excessive Joule heating (400 V/m), lowering the separation time to about 20 minutes per reaction run. Not only are the bases separated more rapidly, there is also increased resolution over conventional gel electrophoresis. Furthermore, many capillaries are analyzed in parallel (Wooley and Mathies, xe2x80x9cUltra-high-speed DNA sequencing using capillary electrophoresis chips,xe2x80x9d Anal. Chem. 67:3676-3680, 1995), allowing amplification of base information generated (actual rate is equal to 200,000 bases/day). The main drawback is that there is not continuous loading of the capillaries since a new gel-filled capillary tube must be prepared for each reaction. Capillary gel electrophoresis machines have recently been commercialized.
Multiplex sequencing is a method which more efficiently uses electrophoretic gels (Church and Kieffer-Higgins, xe2x80x9cMultiplex DNA sequencing,xe2x80x9d Science. 240:185-88, 1988). Sanger reaction samples are first tagged with unique oligomers and then up to 20 different samples are run on one lane of the electrophoretic gel. The samples are then blotted onto a membrane. The membrane is then sequentially probed with oligomers that correspond to the tags on the Sanger reaction samples. The membrane is washed and reprobed successively until the sequences of all 20 samples are determined. Even though there is a substantial reduction in the number of gels run, the washing and hybridizing steps are as equally laborious as running electrophoretic gels. The actual sequencing rate is comparable to that of automated gel electrophoresis.
Sequencing by mass spectrometry was first introduced in the late 80""s. Recent developments in the field have allowed for better sequence determination (Crain, MassSpectrom. Rev. 9:505-54, 1990; Little et al., J. Am. Chem. Soc. 116:4893-4897, 1994; Keough et al., Rapid Commun. Mass Spectrom. 7:195-200,1993; Smirnov et al., 1996). Mass spectrometry sequencing first entails creating a population of nested DNA molecules that differ in length by one base. Subsequent analysis of the fragments is performed by mass spectrometry. In one example, an exonuclease is used to partially digest a 33-mer (Smirnov, xe2x80x9cSequencing oligonucleotides by exonuclease digestion and delayed extraction matrix-assisted laser desorption ionization time-of-flight mass spectrometry,xe2x80x9d Anal. Biochem. 238:19-25, 1996). A population of molecules with similar 5xe2x80x2 ends and varying points of 3xe2x80x2 termination is generated. The reaction mixture is then analyzed. The mass spectrometer is sensitive enough to distinguish mass differences between successive fragments, allowing sequence information to be generated.
Mass spectrometry sequencing is highly accurate, inexpensive, and rapid compared to conventional methods. The major limitation, however, is that the read length is on the order of tens of bases. Even the best method, matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) mass spectroscopy (Smirnov et al., xe2x80x9cSequencing oligonucleotides by exonuclease digestion and delayed extraction matrix-assisted laser desorption ionization time-of-flight mass spectrometry,xe2x80x9d Anal. Biochem. 238:19-25, 1996), can only achieve maximum read lengths of 80-90 base pairs. Much longer read lengths are physically impossible due to fragmentation of longer DNA at guanidines during the analysis step. Mass spectrometry sequencing is thus limited to verifying short primer sequences and has no practical application in large-scale sequencing.
The Scanning tunneling microscope (STM) sequencing (Ferrell, xe2x80x9cScanning tunneling microscopy in sequencing of DNA.xe2x80x9d In Molecular Biology and Biotechnology, R. A. Meyers, Ed. VCH Publishers, New York, 1997) method was conceived at the time the STM was commercially available. The initial promise of being able to read base-pair information directly from the electron micrographs no longer holds true. DNA molecules must be placed on conducting surfaces, which are usually highly ordered pyrolytic graphite (HOPG) or gold. These lack the binding sites to hold DNA strongly enough to resist removal by the physical and electronic forces exerted by the tunneling tip. With difficulty, DNA molecules can be electrostatically adhered to the surfaces. Even with successful immobilization of the DNA, it is difficult to distinguish base information because of the extremely high resolutions needed. With current technology, purines can be distinguished from pyrimidines, but the individual purines and pyrimidines cannot be identified. The ability to achieve this feat requires electron microscopy to be able to distinguish between aldehyde and amine groups on the purines and the presence or absence of methyl groups on the pyrimidines.
Enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA) sequencing uses the detection of pyrophosphate release from DNA polymerization to determine the addition of successive bases. The pyrophosphate released by the DNA polymerization reaction is converted to ATP by ATP sulfurylase and the ATP production is monitored continuously by firefly luciferase. To determine base specificity, the method uses successive washes of ATP, CTP, GTP, and TTP. If a wash for ATP generates pyrophosphate, one or more adenines are incorporated. The number of incorporated bases is directly proportional to the amount of pyrophosphate generated. Enhancement of generated sequence information can be accomplished with parallel analysis of many ELIDA reactions simultaneously.
Exonuclease sequencing involves a fluorescently labeled, single-stranded DNA molecule which is suspended in a flowing stream and sequentially cleaved by an exonuclease. Individual fluorescent bases are then released and passed through a single molecule detection system. The temporal sequence of labeled nucleotide detection corresponds to the sequence of the DNA (Ambrose et al., xe2x80x9cApplication of single molecule detection to DNA sequencing and sizing,xe2x80x9d Ber. Bunsenges. Phys. Chem. 97:1535-1542, 1993; Davis et al., xe2x80x9cRapid DNA sequencing based on single-molecule detection,xe2x80x9d Los Alamos Science. 20:280-6, 1992; Jett et al., xe2x80x9cHigh-speed DNA sequencing: an approach based upon fluorescence detection of single molecules,xe2x80x9d J. Of Bio. Structure and Dynamics. 7:301-9, 1989). Using a processive exonuclease, it theoretically is possible to sequence 10,000 bp or larger fragments at a rate of 10 bases per second.
In the sequencing by hybridization method, a target DNA is sequentially probed with a set of oligomers consisting of all the possible oligomer sequences. The sequence of the target DNA is generated with knowledge of the hybridization patterns between the oligomers and the target (Bains, xe2x80x9cHybridization methods for DNA sequencing,xe2x80x9d Genomics. 11:294-301, 1991; Cantor et al., xe2x80x9cReporting on the sequencing by hybridization workshop,xe2x80x9d Genomics. 13:1378-1383, 1992; Drmanac et al., xe2x80x9cSequencing by hybridization.xe2x80x9d In Automated DNA Sequencing and Analysis Techniques, J. Craig Ventor, Ed. Academic Press, London, 1994). There are two possible methods of probing target DNA. The xe2x80x9cProbe Upxe2x80x9d method includes immobilizing the target DNA on a substrate and probing successively with a set of oligomers. xe2x80x9cProbe Downxe2x80x9d on the other hand requires that a set of oligomers be immobilized on a substrate and hybridized with the target DNA. With the advent of the xe2x80x9cDNA chip,xe2x80x9d which applies microchip synthesis techniques to DNA probes, arrays of thousands of different DNA, probes can be generated on a 1 cm2 area, making Probe Down methods more practical. Probe Up methods would require, for an 8-mer, 65,536 successive probes and washings, which would take an enormous amount of time. On the other hand, Probe Down hybridizations generates data in a few seconds. With perfect hybridization, 65,536 October probes would determine a maximum of 170 bases. With 65,536 xe2x80x9cmixedxe2x80x9d 11-mers, 700 bases can be generated.
The most common limitation of most of these techniques is a short read length. In practice a short read length means that additional genetic sequence information needs to be sequenced before the linear order of a target DNA can be deciphered. The short fragments have to be bridged together with additional overlapping fragments. Theoretically, with a 500 base read length, a minimum of 9xc3x97109 bases need to be sequenced before the linear sequence of all 3xc3x97109 bases of the human genome are properly ordered. In reality, the number of bases needed to generate a believable genome is approximately 2xc3x971010 bases. Comparisons of the different techniques show that only the impractical exonuclease sequencing has the theoretical capability of long read lengths. The other methods have short theoretical read lengths and even shorter realistic read lengths. To reduce the number of bases that need to be sequenced, it is clear that the read length must be improved.
Protein sequencing generally involves chemically induced sequential removal and identification of the terminal amino acid residue, e.g., by Edman degradation. See Stryer, L., Biochemistry, W. H. Freeman and Co., San Francisco (1981) pp. 24-27. Edman degradation requires that the polypeptide have a flee amino group which is reacted with an isothiocyanate. The isothiocyanate is typically phenyl isothiocyanate. The adduct intramolecularly reacts with the nearest backbone amide group of the polymer thereby forming a five membered ring. This adduct rearranges and the terminal amino acid residue is then cleaved using strong acid. The released phenylthiohydantoin (PTH) of the amino acid is identified and the shortened polymer can undergo repeated cycles of degradation and analysis.
Further, several new methods have been described for carboxy terminal sequencing of polypeptides. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Carboxy terminal sequencing methods mimic Edman degradation but involve sequential degradation from the opposite end of the polymer. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Like Edman degradation, the carboxy-terminal sequencing methods involve chemically induced sequential removal and identification of the terminal amino acid residue.
More recently, polypeptide sequencing has been described by preparing a nested set (sequence defining set) of polymer fragments followed by mass analysis. See Chait, B. T. et al., Science 257:1885-94 (1992). Sequence is determined by comparing the relative mass difference between fragments with the known masses of the amino acid residues. Though formation of a nested (sequence defining) set of polymer fragments is a requirement of DNA sequencing, this method differs substantially from the conventional protein sequencing method consisting of sequential removal and identification of each residue. Although this method has potential in practice it has encountered several problems and has not been demonstrated to be an effective method.
Each of the known methods for sequencing polymers has drawbacks. For instance most of the methods are slow and labor intensive. The gel based DNA sequencing methods require approximately 1 to 3 days to identify the sequence of 300-800 units of a polymer. Methods such as mass spectroscopy and ELIDA sequencing can only be performed on very short polymers.
A need exists for de noveau polymer sequence determination. The rate of sequencing has limited the capability to generate multiple body and temporal expression maps which would undoubtedly aid the rapid determination of complex genetic function. A need also exists for improved systems and methods for analyzing polymers in order to speed up the rate at which diagnosis of diseases and preparation of new medicines is carried out.
The invention relates to new systems, methods and products for analyzing polymers and in particular new systems, methods and products useful for determining the sequence of polymers. The invention has numerous advantages over prior art systems and methods used to sequence polymers. Using the methods of the invention the entire human genome could be sequenced several orders of magnitude faster than could be accomplished using conventional technology. In addition to sequencing the entire genome, the systems, methods and products of the invention can be used to create comprehensive and multiple expression maps for developmental and disease processes. The ability to sequence an individual""s genome and to generate multiple expression maps will greatly enhance the ability to determine the genetic basis of any phenotypic trait or disease process.
According to one aspect, a system for optically analyzing a polymer of linked units includes an optical source, an interaction station, an optical detector, and a processor. The optical source is constructed to emit radiation of a selected wavelength. The interaction station is constructed to receive the emitted radiation and produce a localized radiation spot from the radiation emitted from the optical source. The interaction station is also constructed to sequentially receive units of the polymer and arranged to irradiate sequentially the units at the localized radiation spot. The optical detector is constructed to detect radiation including characteristic signals resulting from interaction of the localized radiation spot with the units. The processor is constructed andy arranged to analyze the polymer based on the detected radiation.
Preferred embodiments of this aspect include one or more of the following features:
The interaction station is constructed to sequentially receive the units being selectively labeled with a radiation sensitive label and the interaction includes interaction of the localized radiation with the radiation sensitive label.
The radiation sensitive label includes a fluorophore.
The interaction station includes a constructed to receive the emitted radiation and provide the evanescent radiation in response thereto.
The interaction station includes a slit having a width in the range of 1 nm to 500 nm, wherein the slit produces the localized radiation spot.
The interaction station includes a microchannel and a slit having a submicron width arranged to produce the localized radiation spot. The microchannel is constructed to receive and advance the polymer units through the localized radiation spot.
The width of the slit is in the range of 10 nm to 100 nm.
The system may include a polarizer and the optical source is a laser constructed to emit a beam of radiation and the polarizer is arranged to polarize the laser beam prior to reaching the slit.
The polarizer may be arranged to polarize the laser beam in parallel to the width of the slit, or perpendicular to the width of the slit.
The interaction station may include several slits located perpendicular to the microchannel that is arranged to receive the polymer in a straightened form.
The interaction station may include a set of electrodes constructed and arranged to provide electric field for advancing the units of the polymer through the microchannel.
The system may further include an alignment station constructed and arranged to straighten the polymer and provide the straightened polymer to the interaction station.
In another embodiment a method for optically analyzing a polymer of linked units comprising:
labeling selected units of the polymer with radiation sensitive labels;
sequentially passing the units of the polymer through a microchannel;
generating radiation of a selected wavelength to produce therefrom a localized radiation spot;
irradiating sequentially the labeled units of the polymer at the localized radiation spot;
detecting sequentially radiation providing characteristic signals resulting from interactions of the localized radiation spot with the labels or the units; and
analyzing the polymer based on the detected radiation.
In another embodiment, an article of manufacture used for optically analyzing a polymer of linked units, comprising an interaction station fabricated on a substrate and constructed to receive radiation and produce therefrom a localized radiation spot. The interaction station is further constructed to sequentially receive units of the polymer and arranged to irradiate sequentially the units at the localized radiation spot to generate characteristic signals of radiation.
According to another aspect, a system for optically analyzing a polymer of linked units includes an optical source, an interaction station, an optical detector, and a processor. The optical source is constructed to emit radiation of a selected wavelength. The interaction station is constructed to receive the emitted radiation and constructed to sequentially receive units of the polymer and arranged to irradiate sequentially the units of the polymer with evanescent radiation excited by the radiation emitted from the source. The optical detector is constructed to detect radiation including characteristic signals resulting from interaction of the evanescent radiation with the units. The processor is constructed and arranged to analyze the polymer based on the detected radiation.
Preferred embodiments of this aspect include one or more of the following features:
The interaction station is constructed to sequentially receive the units being selectively labeled with a radiation sensitive label and the interaction includes interaction of the evanescent radiation with the radiation sensitive label.
The radiation sensitive label includes a fluorophore.
The interaction station includes a waveguide constructed to receive the emitted radiation and provide the evanescent radiation in response thereto.
The waveguide is a dielectric waveguide constructed to achieve total internal reflection of introduced light. The waveguide is a rectangular mirror waveguide with a dielectric surrounded by metallic mirror layers constructed to have a low loss of introduced light. The waveguide includes a tip including an aperture in the metallic mirror layers and arranged to emit the evanescent radiation. The waveguide includes a tip constructed to emit the evanescent radiation.
The interaction station includes a nanochannel located at the tip of the waveguide and arranged to receive the polymer in a straightened form.
The interaction station includes a set of electrodes constructed and arranged to provide electric field for advancing the units of the polymer through the nanochannel. The electrodes are internal electrodes.
The electrodes are external electrodes. The nanochannel is between 2 and 50 nanometers.
The waveguide is further constructed and arranged to receive the radiation including the characteristic signals and optically couple the received radiation to the optical detector.
The interaction station includes another waveguide constructed and arranged to receive the radiation including the characteristic signals and optically couple the received radiation to the optical detector.
The system further includes an alignment station constructed and arranged to straighten the polymer and provide the straightened polymer to the interaction station.
In yet another aspect the invention is a system for optically analyzing a polymer utilizing confocal fluorescence illumination of linked units. The system includes an optical source constructed to emit optical radiation; a filter constructed to receive and filter said optical radiation to a known wavelength; a dichroic mirror constructed to receive said filtered optical radiation; an interaction station constructed to receive said filtered optical radiation and produce a localized radiation spot from said filtered optical radiation, said interaction station being also constructed to sequentially receive units of said polymer and arranged to irradiate sequentially said units at said localized radiation spot; an optical detector constructed to detect radiation including characteristic signals resulting from interaction of said units at said localized radiation spot; and a processor constructed and arranged to analyze said polymer based on said detected radiation including said characteristic signals.
In one embodiment the interaction station is constructed to sequentially receive said units being selectively labeled with a radiation sensitive label producing said characteristic signals at said localized radiation spot. In another embodiment the radiation sensitive label includes a fluorophore. In some embodiments the filter is a laser line filter.
The system may also include an objective, wherein the objective focuses said filtered optical radiation.
The proposed system and method for analyzing polymers is particularly useful for determining the sequence of units within a DNA molecule and can eliminate the need for generating genomic libraries, cloning, and colony picking, all of which constitute lengthy pre-sequencing steps that are major limitations in current genomic-scale sequencing protocols. The methods disclosed herein provide much longer read lengths than achieved by the prior art and a million-fold faster sequence reading. The proposed read length is on the order of several hundred thousand nucleotides. This translates into significantly less need for overlapping and redundant sequences, lowering the real amount of DNA that needs to be sequenced before genome reconstruction is possible. The actual time taken to read a given number of units of a polymer is a million-fold more rapid than current methods because of the tremendous parallel amplification supplied by a novel apparatus also claimed herein, which is referred to as a nanochannel plate or a microchannel plate. The combination of all these factors translates into a method of polymer analysis including sequencing that will provide enormous advances in the field of molecular and cell biology.