Proteins substantially comprised of short, repeating amino acid sequence motifs—often called “protein polymers” in the biomaterials literature—can be produced using bacterial protein expression systems, through genetic engineering (e.g., cloning, PCR) and other well-known, widely used techniques of molecular and cellular biology. The E. Coli-based expression of long (>250 amino acid) polypeptides comprising highly repetitive, non-naturally derived amino acid sequences, in good yield and with high purity and substantial chain length homogeneity, can be challenging however, especially if the designed proteins have the unusual property of being substantially unstructured (“unfolded”) in solution. These particular challenges in achieving high yield and high monodispersity of chain length typically are not so problematic in the E. Coli-based expression of well-folded proteins with naturally derived amino acid sequences. In this class of protein-based materials, the repeating amino acid motifs (e.g., Gly-Ala-Gly-Thr-Gly-Ser-Ala; SEQ ID NO: 1) are “macromonomers” that constitute the repetitive protein-based polymer. A chosen amino acid sequence might “mimic” a motif within natural proteins, such as silks or elastins (Prince, et al., Biochemistry 1995, 34, 10879-10885; Huang, et al., Polym. Rev. 2007, 47, 29-62; Rabotyagova, et al., Biomacromolecules 2009, 10, 229-236; Simnick, et al., A., Polym. Rev. 2007, 47, 121-154), or instead might be designed “from scratch” (de novo) in the anticipation of fulfilling a particular purpose (Farmer, et al., Macromolecules 2006, 39, 162-170; Farmer, et al., Pharm. Res. 2008, 25, 700-708). Biosynthetically produced protein polymers are increasingly employed for biomedical applications (e.g., as constituents of tissue engineering scaffolds or as agents enabling the facile purification of other, desired protein targets to which the protein polymers are biosynthetically fused). Such biosynthetic polymers offer certain advantages, if used as an alternative to abiological, synthetic polymers (Davis, et al., Biomacromolecules 2009, 10, 1125-1134; Lim, et al., Biomacromolecules 2008, 9, 222-230; Petka, et al., Science 1998, 281, 389-392; Xu, et al., Biomacromolecules 2005, 6, 1739-1749). A polymer that is chemically—as opposed to biosynthetically—produced will inevitably have some degree of polydispersity, i.e., a certain “breadth” in chain length or molar mass distribution, which will depend upon the nature of the chemical reaction used for polymer synthesis. Polydispersity, or chain length inhomogeneity, is non-ideal for certain applications of polymers that have repetitive monomer or macromonomer sequences. This may be the case, in particular, when such polymers are desired for uses in biotechnology or medicine, where purity and homogeneity are often prized and even necessary attributes for proper functioning and characterization.
The properties of protein-based polymers can be customized according to interest by the choice of different DNA sequences to encode the desired amino acid sequence. The only limitations are those of the genetic code (i.e., there are ˜20 natural amino acids to choose from), although proteins have been engineered to incorporate certain non-canonical amino acids, and these technologies are becoming increasingly accessible (van Hest, et al., Chem. Commun. 2001, 1897-1904; Kiick, et al., Proc. Natl. Acad. Sci. U.S.A 2002, 99, 19-24; Connor, et al., Polym. Rev. 2007, 47, 9-28; Kim, et al., J. Am. Chem. Soc. 2005, 127, 18121-18132). In addition, the chain length of the desired amino acid sequence may, in principle, be specified precisely according to the length of the gene encoding the protein. Protein-based materials produced in biological systems such as the bacterium E. Coli can be produced with more precise monomer sequences and more homogeneous chain lengths than conventional, chemically synthesized polymers, which typically are not sequence-specific to the same high degree that proteins are, and which are not completely monodisperse, in the same way that a biosynthesized protein may be (Davis, at al., Biomacromolecules 2009, 10, 1125-1134; van Hest, et al., Chem. Commun. 2001, 1897-1904; Kiick, K. L., Polym. Rev. 2007, 47, 1-7).
Free-Solution Conjugate Electrophoresis (FSCE), which in the past has been called End-Labeled Free-Solution Electrophoresis (ELFSE), uses a pure, substantially monodisperse polymeric tag (sometimes called a “drag-tag”, if it fulfills the purpose of adding hydrodynamic drag), tethered end-on to a DNA molecule, to enable size-based separation of a mixture of DNA molecules, by free-solution microchannel electrophoresis. Alternatively, when a particular, monodisperse DNA molecule is attached to a polydisperse preparation of “drag-tags”, it is then possible to achieve size-based separation of the drag-tags themselves, and profile their size distribution. It is interesting that FSCE enables the development of substantially novel approaches to DNA sequencing and genotyping, and indeed offers a new method to achieve the size-based separation of DNA for bioanalytical applications. FSCE is better suiting than the more typically used method of gel electrophoresis for DNA separations on microfluidic devices, since FSCE obviates the need for fixed hydrogels or viscous polymer solutions to provide size-based DNA separation. Free-solution operation will save time, reduce cost and complication, and avoid challenges associated with the loading and unloading of the gel or polymer solution (“sieving matrix”) in microfabricated electrophoresis devices (“microfluidic chips”). For the purposes of FSCE, an aqueous buffer can be loaded into microchannels using a low applied pressure (e.g., <15 psi) or perhaps by capillary action, which is easy and will facilitate the automation of such microdevices for bioanalytical applications. Free-solution conjugate electrophoresis of DNA or other biomolecules can be more easily implemented in microdevices than any gel-based method, and such devices then could use a wide variety of biomolecule detection schemes, or a wide variety of methods and strategies to assess or control the particular attributes or behavior of a nucleic acid sample or other type of biomolecular sample. Moreover, the types of bioconjugates that are described herein (protein polymer conjugates with nucleic acids) or other conjugates that comprise water-soluble protein polymers with repetitive amino acid sequences, substantially random coil solution conformations, and a high degree of monodispersity could be developed for applications that lie outside of the field of bioanalytical science, for instance, for pharmaceutical or other biomedical or therapeutic uses.
In FSCE, which is primarily aimed at bioanalytical applications, a monodisperse perturbing entity or “drag-tag” with a different molecular charge-to-molecular friction ratio than DNA is attached to nucleic acid polymers. The use of a drag-tag in this manner “breaks” the equivalence of the size-dependence of DNA charge and hydrodynamic friction, the ratio of which dictates electrophoretic mobility. DNA's typical size-independence or its very low degree of size dependence of molecular charge and molecular friction is understood to be a consequence of its unique behavior as a “free-draining coil” during electrophoresis, an attribute that usually prevents its high-resolution, size-based separation in “free solution”, i.e., in the absence of a sieving medium such as a porous gel or polymer solution. The presence of a “drag-tag”—i.e., a conjugated molecular modifier that alters the molecular properties and behavior of DNA—has been shown to introduce DNA size-dependence to the electrophoretic mobility of the drag-tag-DNA conjugates, allowing separation in free solution, i.e., in the absence of sieving media of any kind. For example, using a terminal drag-tag as a molecular modifier to single-stranded (ss) DNA molecules produced in the Sanger cycle sequencing reaction, free-solution electrophoretic DNA sequencing can be achieved (Sanger, et al., Proc. Natl. Acad. Sci. U.S.A. 1977, 74, 5463-5467), which is a striking achievement because Sanger-based DNA sequencing requires the size-based separation of DNA with single-base DNA chain length resolution. On the other hand, if a drag-tag is not used to modify the Sanger fragments, then the same mixture of ssDNA molecules produced in the Sanger reaction fails to show appreciable DNA size-dependence of electrophoretic mobility; certainly, without a drag-tag, it is impossible to ascertain DNA sequence by free-solution electrophoresis. Recent publications have shown that the larger the hydrodynamic drag provided by the drag-tag (i.e., generally, the larger the size of the drag-tag), the greater the length of the DNA sequencing fragments that can be resolved, and consequently, the longer the “read length” (zone of contiguous DNA base sequences ascertained) that can be obtained by FSCE. In addition to higher drag, longer read lengths are obtained if the drag-tag preparation used for DNA modification is substantially monodisperse in its molecular size, molecular structure, and chain length if the drag-tag is a polymer.
Indeed, an advantageous drag-tag for use in FSCE will be completely or substantially monodisperse, easily water-soluble, uncharged or possessing a low degree of positive electrostatic charge, and will show minimal adsorption to or non-specific interaction with the glass (fused silica) microchannel walls. Additionally, to be useful for FSCE, a drag-tag must be able to be uniquely and stably attached to DNA, preferably “end-on”, i.e., at one of the DNA's molecular termini (Meagher, et al., Anal. Chem. 2008, 80, 2842-2848). From this imposing list of needed attributes for an advantageous FSCE drag-tag, perhaps the most important property for a drag-tag is complete monodispersity, such that each and every drag-tag molecule in a preparation that is conjugated to DNA molecules is identical in its chain length, amino acid sequence, and particular chemical structure, and hence, is identical in its net, counterion-screened electrostatic charge and the hydrodynamic drag it generates in free-solution electrophoresis. If a polydisperse preparation of molecules is used as drag-tags for FSCE-based DNA analysis, the resulting bioconjugates are similarly polydisperse, and this has deleterious effects on the usefulness of the data obtained in the bioanalytical separation. In this case, the peak pattern obtained by microchannel electrophoresis would be complex, because for any given DNA molecule in the nucleic acid mixture of interest, i.e., for DNA of any particular chain length, there will be multiple peaks in the electropherogram, instead of a single peak, as is most desirable and useful for bioanalytical applications. The DNA drag-tag conjugate peaks for a particular DNA molecule may overlap with peaks corresponding to bioconjugate peaks for DNA molecules of different sizes—this would certainly be the case for a DNA sequencing sample prepared by the Sanger reaction—which would make accurate DNA sizing difficult or impossible. This requirement for total monodispersity eliminates from consideration all of the commonly available chemically synthesized polymers, microparticles, and nanoparticles, and makes such polymers or particles poor candidates for FSCE DNA sequencing drag-tags; none of these is completely and totally monodisperse (Meagher, et al., Electrophoresis 2005, 26, 331-350). Although solid-phase synthesis techniques can be used to generate monodisperse, sequence-specific polyamide molecules such as polypeptides and polypeptoids (i.e., poly-N-substituted glycines), solid-phase synthesis technology produces polyamides that are too small/too short in chain length to generate sufficient hydrodynamic drag for the separation of large ssDNA fragments (>120 bases in length) for FSCE sequencing (Haynes, et al., Bioconjugate Chem. 2005, 16, 929-938).
Natural proteins are very often much larger in size than polyamides produced by solid-phase synthetic approaches, however, natural proteins have other drawbacks that make them unsuitable as drag-tags for bioanalytical applications. For instance, in aqueous solution, most natural proteins are “folded” into compact, three-dimensional chain configurations (“conformations”), and typically present numerous positive and negative surface charges. Charged proteins could have deleterious electrostatic interactions with the DNA analytes, or with the glass microchannel walls of the electrophoresis chamber. For instance, proteins with a high density of positive charges could ionically bind to DNA molecules, or to the microchannel wall, or both. But on the other hand, proteins with a high degree of negative charge will tend to electrophorese in the same direction as the DNA molecules themselves, and so might not substantially change the size-dependence of electrophoretic mobility. This is why a DNA modifier that is close to net-neutral in its charge will be most desirable for FSCE, if one is able to identify a substantially uncharged modifier which is also water-soluble and which also allows facile end-on attachment to DNA molecules. It should also be considered that natural proteins typically contain a variety of different chemically reactive groups as amino acid “side chains” (e.g., a primary amine group in lysine, a thiol in cysteine, a carboxylic acid in both glutamic acid and aspartic acid). The presence of these chemically reactive groups in a protein can make the unique, precise, chemo-selective attachment of natural proteins to DNA molecules, through a very particular site on the protein, difficult if not impossible. In contrast, properly designed and properly prepared, sequence-engineered protein polymers are able to meet the many stringent requirements of a useful drag-tag, through careful design of the repetitive amino acid “macromonomer” sequence, in such a way as to reduce or eliminate the number of potentially problematic charged and reactive sites. As discussed previously, biosynthetically produced protein polymers also can, in principle, be produced with a much higher degree of homogeneity of their physical structure and molecular properties than chemically synthesized polymers.
As mentioned above, FSCE itself also can be used as a highly sensitive, fluorescence-based detection method to investigate the polydispersity of a given preparation of a protein polymer drag-tag, and this method is very important, in fact, for the assessment of the purity and homogeneity of a candidate drag-tag preparation for DNA sequencing applications. To accomplish this, a preparation of potential protein drag-tags is conjugated end-on to a monodisperse, fluorescently labeled oligonucleotide primer, and the obtained bioconjugates are analyzed by free-solution microchannel electrophoresis. If one is characterizing a candidate drag-tag preparation, it is preferable to observe only two peaks in this type of electropherogram: (1) a peak representing free (unconjugated) DNA, which passes the detector first given the absence of a “drag-tag”, followed by a peak representing drag-tag-DNA conjugates, eluting later because DNA's electromigration velocity is reduced as a result of the added hydrodynamic drag associated with the attached drag-tag molecule, which it “pulls” along with it as it moves in an applied electric field. This method also has been used to characterize a the breadth of chain length distribution in a commercially obtained preparation of “monodisperse” (PDI or Polydispersity Index=1.01, considered herein to be very low) synthetic polyethylene glycol) (PEG), to which a monodisperse, fluorescently labeled DNA molecule was conjugated end-on via chemical methods (Vreeland, et al., Anal. Chem. 2001, 73, 1795-1803). An analysis by capillary electrophoresis revealed more than 110 different bioconjugate peaks, which were well-resolved from each other, in an overall Gaussian distribution of PEG-DNA bioconjugates comprising PEGs of differing chain lengths. It was striking, in this example, that single-monomer differences in PEG structure, i.e., different numbers of —CH2CH2O— units, and even the difference of one such monomer unit, were enough to produce distinct peaks that were resolvable by FSCE, demonstrating the tremendous resolving power as well as the high sensitivity of this technique to provide useful electrophoretic mobility shifts for a sample of interest, based on small molecular differences. FSCE has also been used to characterize solid-phase polypeptoid synthesis products (Vreeland, et al., Bioconjugate Chem. 2002, 13, 663-670) and to assess and analytically profile the deamidation (chemical degradation) products of a family of protein polymers comprising a significant number of glutamine residues (because glutamine can become converted to glutamic acid residues, over time, via undesired chemical reactions in water) (Won, et al., Biomacromolecules 2004, 5, 1624-1624).
The design, purification, and obtainment of sufficient, useful amounts of completely or substantially monodisperse protein polymers, which are suitable as drag-tags for FSCE-based DNA sequencing, was a challenging task requiring more than 12 years of steady molecular engineering work. The first family of reported protein polymer drag-tag designs of various lengths and amino acid sequences were found to be heterogeneous after purification from bacterial cultures, when assessed by FSCE using a monodisperse, fluorescently labeled oligonucleotide, despite the fact that these protein polymers had been produced in a simple biological system, E. coli, according to what was understood to be the most commonly used, methods for heterologous protein expression in bacteria (Meagher, et al., Electrophoresis 2005, 26, 331-350; Won, et al., Biomacromolecules 2004, 5, 1624-1624; Won, et al., Electrophoresis 2005, 26, 2138-2148). However, recently, a small, random coil, substantially monodisperse protein polymer drag-tag comprising 127 amino acids was produced and tested as a FSCE drag-tag. This 127mer protein polymer was demonstrated to be useful for “short-read” Sanger DNA sequencing in free solution, providing a reproducibly obtainable read length of ˜180 bases of contiguous DNA sequence (Meagher, et al., Anal. Chem. 2008, 80, 2842-2848). To obtain longer read lengths (>400 bases) by FSCE (as would be desired because a typical human exon, in an expressed gene, is at least 400 bases long), our originally developed methods and strategies for the preparation of biosynthetic polypeptide drag-tags were found to be insufficient to produce polypeptides with substantial and bioanalytically sufficient levels of monodispersity.