Capillary electrophoresis (CE) of DNA through polymeric separation matrices is presently the dominant technology for high-throughput sequencing. Although a final draft of the Human Genome was published recently (see, e.g., Marshall, E. Science 2000, 288, 2294-2295; Smaglik, P. Nature 2000, 404 (6774), 111-111; each herein incorporated by reference in their entireties), hundreds of other genome projects (see, e.g., Bernal, A., et al. Nucleic Acids Research 2001, 29 (1), 126-127; herein incorporated by reference in its entirety), as well as individualized genomics, still require long DNA sequencing read lengths at low cost. Novel polymeric matrices that provide longer read lengths than commercially available sequencing matrices will be instrumental for the throughput increases and cost reductions that are still required to make the long-term goal of personalized genomics economically feasible. In the present application, the inventors have developed a novel material for DNA sequencing with the clear potential to combine the high-selectivity separations of a crosslinked slab gel with the replaceable nature of a separation matrix composed of linear polymers.
Originally, DNA sequencing was performed on highly crosslinked polyacrylamide slab gels (see, e.g., Chrambach, A., Rodbard, D. Science 1971, 172, 440-450; Bishop, D. H., et al. Journal of Molecular Biology 1967, 26 (3), 373; each herein incorporated by reference in their entireties). Crosslinked polyacrylamide yields excellent DNA separations, allowing for long reads under optimized conditions (see, e.g., Ansorge, W., et al. Electrophoresis 1992, 13 (9), 616-619; herein incorporated by reference in its entirety). However, unless very low electric fields are used (which translates into long run times, typically 5-8 hours), large DNA sequencing fragments (>300 bases) rapidly enter the “biased reptation with orientation” (see, e.g., Slater, G. W., Noolandi, J. Biopolymers 1986, 25, 431-454; herein incorporated by reference in its entirety) migration mode in these highly crosslinked media. The process of obtaining long DNA sequencing reads using ultra-thin slab gels is time and labor intensive. For this reason, high-throughput genome sequencing centers largely abandoned slab gels in the late 1990s, in favor of automated capillary array electrophoresis (CAE) (see, e.g., Cheng, J. Prog. Biochem. Biophys. 1995, 22 (3), 223-227; herein incorporated by reference in its entirety).
During the initial stages of the development of capillary electrophoresis (CE), researchers used in situ-polymerized, highly crosslinked polyacrylamide within the lumen of the capillary. Sequencing in crosslinked polyacrylamide capillary gels was shown by Karger and co-workers (see, e.g., Heiger, D. N., et al. Chromatogr. 1990, 33-48; Cohen, A. S., et al. Journal of Chromatography 1990, 516 (1), 49; Guttman, A., et al. Anal. Chem. 1990, 62 (2), 137-141; each herein incorporated by reference in their entireties), Dovichi and co-workers (see, e.g., Harke, H. R., et al. J. Chromatogr. 1992, 608, 143-150; herein incorporated by reference in its entirety), Smith and co-workers (see, e.g., Smith, L. M. Nature 1991, 349 (6312), 812-813; Drossman, H., et al. Analytical Chemistry 1990, 62 (9), 900-903; Luckey, J. A., et al. Nucleic Acids Research 1990, 18 (15), 4417-4421; each herein incorporated by reference in their entireties), and Baba et al. (see, e.g., Baba, Y., et al. Anal. Chem. 1992, 64 (11), 1221-1225; herein incorporated by reference in its entirety) for sequencing reads of up to 350 bases. Crosslinked polyacrylamide capillary gels were typically produced using a total monomer concentration of up to 5% and a concentration of Bis up to 5%; short-read sequencing separations typically required 60-70 minutes.
The direct transfer of a “slab gel” technology to the micron-sized channels typical of fused-silica capillaries was not effective for a variety of reasons. First, voids left within the capillary due to the increased density of the polymer relative to its monomer are detrimental to highly efficient separations (see, e.g., Guttman, A., et al. Anal. Chem. 1990, 62 (2), 137-141; Bae, Y. C., et al. J. Chromatogr. A 1993, 652, 17-22; each herein incorporated by reference in their entireties). Second, an in situ-polymerized, highly crosslinked structure is difficult to remove from the capillary, making these prepared capillaries useful for a small number of separations each. Finally, since there is no a priori knowledge of the final polymer properties, rigorous quality control is not possible for in situ-polymerized matrices.
The use of a replaceable DNA sequencing matrix, in particular a highly entangled solution of LPA, provided resolution of ssDNA fragments without the use of an infinitely crosslinked polymer network (see, e.g., Heiger, D. N., et al. Chromatogr. 1990, 33-48; Bae, Y. C., et al. J. Chromatogr. A 1993, 652, 17-22; each herein incorporated by reference in their entireties). A 6% solution of relatively low molar mass LPA (˜1×106 g/mol) provided a read length of over 350 bases in close to 30 minutes, indicating that a highly crosslinked polymer network was not required for DNA sequencing within capillaries (see, e.g., Ruiz-Martinez, M. C., et al. Anal. Chem. 1993, 65, 2851-2858; herein incorporated by reference in its entirety). Also, compared to sequencing separations by CE using crosslinked gels, comparable sequencing reads could be achieved in a shorter time (i.e., with a higher field), since a more open network shifts the “biased reptation with orientation” threshold to larger DNA sizes (see, e.g., Slater, G. W., Noolandi, J. Biopolymers 1986, 25, 431-454; herein incorporated by reference in its entirety).
Importantly, the use of physically entangled, linear polymer solutions for the separation of DNA sequencing fragments within capillaries also allowed for relatively facile loading and replacement of the separation matrix between runs (see, e.g., Ruiz-Martinez, M. C., et al. Anal. Chem. 1993, 65, 2851-2858; herein incorporated by reference in its entirety). This enabled, for the first time, complete automation of DNA sequencing. Moreover, production and characterization of polymers ex situ has allowed researchers to correlate polymer physical and chemical properties, including weight-average molar mass (see, e.g., Goetzinger, W., et al. Electrophoresis 1998, 19, 242-248; herein incorporated by reference in its entirety), polydispersity (see, e.g., Barron, A. E., et al. Electrophoresis 1996, 17, 744-757; Salas-Solano, O., et al. Anal. Chem. 1998, 70 (19), 3996-4003; each herein incorporated by reference in their entireties), and hydrophobicity (see, e.g., Albarghouthi, M. N., et al. Electrophoresis 2001, 22, 737-747; Chiari, M., et al. Electrophoresis 1994, 15, 177-186; Gelfi, C., et al. Electrophoresis 1996, 17, 738-743; each herein incorporated by reference in their entireties), with DNA separation performance.
The chemical and physical properties of polymers used for microchannel DNA sequencing are important, as they control the time scale of polymer-polymer and polymer-DNA interactions within the entangled polymer network, which in turn influences the mechanism of DNA separation (see, e.g., Bae, Y. C., et al. J. Chromatogr. A 1993, 652, 17-22; herein incorporated by reference in its entirety). An ideal polymer matrix for DNA sequencing should be hydrophilic, physically and chemically stable under sequencing conditions, and relatively low in viscosity (during loading and replacement). Typically, high-molar mass polymers (Mw>2×106 g/mol) give the best performance because they form robust entangled networks (see, e.g., Albarghouthi, M., Barron, A. E. Electrophoresis 2000, 21, 4096-4111; herein incorporated by reference in its entirety).
A range of linear polymers have shown good utility for use in DNA sequencing, including linear polyacrylamide (LPA) (see, e.g., Carrilho, E., et al. Anal. Chem. 1996, 68 (19), 3305-3313; Zhou, H., et al. Anal. Chem. 2000, 72 (5), 1045-1052; each herein incorporated by reference in their entireties), poly(N,N-dimethylacrylamide) (PDMA) (see, e.g., Madabhushi, R. S., et al. Electrophoresis 1997, 18, 104-111; Madabhushi, R. S., et al. Electrophoresis 1998, 19, 224-230; each herein incorporated by reference in their entireties), poly(ethylene oxide) (PEO) (see, e.g., Fung, E. N., et al. Anal. Chem. 1995, 67 (13), 1913-1919; herein incorporated by reference in its entirety), poly(vinyl pyrrolidone) (PVP) (see, e.g., Gao, Q. F., et al. Anal. Chem. 1998, 70 (7), 1382-1388; herein incorporated by reference in its entirety), poly(N-hydroxyethylacrylamide) (polyDuramide™) (see, e.g., Albarghouthi, M. N., et al. Electrophoresis 2002, 23 (10), 1429-1440; herein incorporated by reference in its entirety), and copolymers of N,N-dimethylacrylamide (DMA) and N,N-diethylacrylamide (DEA) (see, e.g., Albarghouthi, M. N., et al. Electrophoresis 2001, 22, 737-747; Buchholz, B. A., et al. Anal. Chem. 2001, 73, 157-164; each herein incorporated by reference in their entireties). To date, high-molar mass LPA gives the best sequencing performance, able to produce a 1000-base read in about 1 hour (see, e.g., Salas-Solano, O., et al. Anal. Chem. 1998, 70 (19), 3996-4003; herein incorporated by reference in its entirety) and 1300 bases in 2 hours (see, e.g., Zhou, H., et al. Anal. Chem. 2000, 72 (5), 1045-1052; herein incorporated by reference in its entirety) with highly optimized polymer molar mass distribution, matrix formulation, sample preparation and clean-up, and base-calling algorithms.
The long reads demonstrated by Karger and co-workers using high-molar mass LPA were accomplished using blends of high and low molar mass LPA. The 1000-base read was performed in a matrix blend composed of 2.0 wt % 9×106 g/mol and 0.5 wt % 5×104 g/mol LPA (see, e.g., Salas-Solano, O., et al. Anal. Chem. 1998, 70 (19), 3996-4003; herein incorporated by reference in its entirety); 1300 bases were sequenced in a matrix blend composed of 2 wt % 1.7×107 g/mol and 0.5 wt % 2.7×105 g/mol (see, e.g., Zhou, H., et al. Anal. Chem. 2000, 72 (5), 1045-1052; herein incorporated by reference in its entirety). The inclusion of a low percentage of low-molar mass polymer increases the total polymer concentration of the matrix, allowing smaller ssDNA fragments to be separated without significantly decreasing the selectivity for the large ssDNA fragments provided by the highly entangled, high-molar mass polymer (see, e.g., Heller, C., et al. Electrophoresis 1999, 20, 1962-1977; Duke, T., et al. Biopolymers 1994, 34, 239-247; Cottet, H., et al. Electrophoresis 1998, 19, 2151-2162; each herein incorporated by reference in their entireties). Significantly shorter read lengths are common in commercial CAE instruments such as the ABI PRISM 3700™ (550 bases in 3-4 hours) and the MegaBACE 1000™ (600 bases in 2 hours), due to the use of lower-viscosity, less entangled matrices for practical reasons and the lower quality of actual genomic DNA samples, among other factors.
Polyacrylamide is a near-ideal polymer for DNA sequencing due to its high hydrophilicity, hence, its excellent ability to entangle with other polyacrylamide chains in aqueous solution, and its facile production to high molar mass using standard free-radical polymerization chemistry. Polyacrylamide is also relatively easy to purify, as it readily precipitates from water with the addition of acetone or methanol. A highly entangled LPA matrix suitable for long-read sequencing has an extremely high zero-shear viscosity (60,000-120,000 cP); hence, high pressure (i.e., 6895 kPa (1000 psi)) is required to initiate flow into the microchannel.
The use of branched copolymer structures for DNA sequencing matrices has been explored as a way to improve the performance of lower-molar mass LPA by modifying its network properties. Viovy and co-workers have produced a relatively low-molar mass, branched copolymer with a polyacrylamide backbone. Poly(acrylamide-g-N-isopropylacrylamide) (see, e.g., Sudor, J., et al. Electrophoresis 2001, 22, 720-728; herein incorporated by reference in its entirety), when heated above the lower critical solution temperature (LCST) of N-isopropylacrylamide (NIPA), forms micelle-like aggregates of NIPA grafts, stabilizing the branched polymer network with “transient crosslinks” and increasing the matrix viscosity by nearly two orders of magnitude (from 100 to 10,000 cP). This polymer displayed excellent utility for the separation of dsDNA in published work, but was not tested as a DNA sequencing matrix. Although this class of branched polymers improves the loading properties of LPA (since lower-molar mass, less viscous solutions can be used), they do not provide the highly entangled network presented by solutions of ultra-high molar mass LPA (Mw>9×106 g/mol) previously demonstrated by Karger and co-workers to give long read lengths.
Chemical crosslinks within slab gels produce an infinitely crosslinked separation medium and provide a mechanically stabilized pore structure for the migration of DNA. An abundance of crosslinks within a slab gel decreases the effective pore size and limits sample diffusion and dispersion during separation (see, e.g., Ugaz, V. M., et al. Electrophoresis 2002, 23 (16), 2777-2787; herein incorporated by reference in its entirety), which is desirable to produce narrow bands on the gel. Ideally, the presence of chemical crosslinks in a high-molar mass polymer for CE would provide the same benefits. Occasional crosslinks within a physically entangled matrix composed of high molar mass polyacrylamide should decrease sample diffusion as well as provide a more robust network for migrating DNA, allowing for significantly longer read lengths when compared to a commercially available linear polyacrylamide having a similar molar mass and extent of physical entanglements.
Two major challenges exist when attempting to incorporate crosslinks into a polymer to be used as a sequencing matrix for CAE. First, the formation of an infinitely crosslinked polymer gel during the polymerization process should be avoided. In addition, large polymer structures of colloidal dimensions should be avoided, as these particles would scatter incident light, preventing sequencing using LIF detection. Finally, the crosslink density should be limited so that the final sequencing matrix retains good fluidity, and so that individual polymer structures may physically entangle with each other (see FIG. 1).