Large amounts of genetic information essential for vital activities of living organisms are recorded as sequences comprised of nucleic acid nucleobases. Unlike proteins which are composed of 20 types of amino acids, the chemical and physical diversity of nucleic acids is limited by only 4 bases (2 base pairs) in natural nucleic acids. Such nucleic acids allow self-replication using itself as a template and transmission of genetic information from DNA to DNA, from DNA to RNA, and/or from RNA to protein. Natural nucleotides pair through two or three hydrogen bonds (A:T/U, G:C) and this pairing enables replication and transmission of the genetic information. In vitro, the natural two-letter genetic alphabet has been expanded with chemically synthetized unnatural nucleotides that form unnatural base pairs (UBPs).
To create an organism with an expanded genetic alphabet, the unnatural nucleoside triphosphates must first be available inside the cell. It has been suggested that this might be accomplished by passive diffusion of free nucleosides into the cytoplasm followed by their conversion to the corresponding triphosphate via the nucleoside salvage pathway. Some unnatural nucleic acids are phosphorylated by the nucleoside kinase from D. melanogaster, and monophosphate kinases appear to be more specific. However, E. coli over-expression of endogenous nucleoside diphosphate kinase results in poor cellular growth.
Benner et al. have designed novel base pairs based on different hydrogen-bonding combinations from natural base pairs, e.g., isoG:isoC and κ:X base pairs (Piccirilli et al., 1990; Piccirilli et al., 1991; Switzer et al., 1993). However, isoG forms a base pair with T through keto-enol tautomerism between 1- and 2-positions; isoC and K are not substrates of polymerases due to an amino group substitution at the 2-position; and nucleoside derivatives of isoC are chemically unstable.
Base pairs that have hydrogen-bonding patterns different from those of natural base pairing and that are capable of eliminating base pairing with natural bases by steric hindrance have also been developed. For example, Ohtsuki et al. (2001) and Hirao et al. (2002) designed 2-amino-6-dimethylaminopurine (X), 2-amino-6-thienylpurine (S), and pyridin-2-one (Y). However, the incorporation of Y opposite X showed low selectivity.
Kool et al. synthesized A and T derivatives (4-methylbenzimidazole (Z) and 9-methyl-1-H-imidazo[(4,5)-b]pyridine (Q), respectively) lacking hydrogen bonding. These base pairs were found to be incorporated into DNA in a complementary manner by the Klenow fragment of E. coli-derived DNA polymerase I. Other base pairs including A:F, Q:T and Z:T are also shown to be incorporated [Morales & Kool, 1999]. Other examples include 7-(2-thienyl)-imidazo[4,5-b]pyridine (Ds) and pyrrole-2-carbaldehyde (Pa).
In an effort to develop an orthogonal third base pair, over 100 hydrophobic bases have been designed, synthesized as the triphosphate and phosphoramidite, and characterized in Floyd Romesberg's laboratory. A large number of hydrophobic base pairs have also been generated, including pyrrolopyridine (PP) and C3-methylisocarbostyryl (MICS) (Wu et al., 2000). However, these bases paired with each other independently of shape fitting, resulting in PP:PP and MICS:MICS incorporation into DNA; and elongation did not substantially proceed after incorporation of such base combinations formed without any shape fitting. Neither H-bonding nor a large aromatic surface area was found to be required for base pair stability in duplex DNA or polymerase mediated replication for some UBPs, e.g., a 3-fluorobenzene (3FB) self-pair.
The development of a third, unnatural DNA base pair, and an expanded genetic alphabet, is a central goal of synthetic and chemical biology and would increase the functional diversity of nucleic acids, provide tools for their site-specific labeling, increase the information potential of DNA, and lay the foundation of a semi-synthetic organism. Described herein is a newly developed class of UBPs that is formed between nucleotides bearing hydrophobic nucleobases (exemplified by the pair formed between d5SICS and dNaM (d5SICS-dNaM)), which is efficiently PCR amplified and transcribed, and whose unique mechanism of replication has been thoroughly characterized through kinetic (Lavergne et al., Chem. Eur. J. (2012) 18:1231-1239; Seo et al., J. Am. Chem. Soc. (2009) 131:3246-3252) and structural studies (Betz et al., J. Am. Chem. Soc. (2013) 135:18637-18643; Malyshev et al., Proc. Natl. Acad. Sci. USA (2012) 109:12005-12010). Until now, no living organisms capable of incorporation and propagation of unnatural information existed.
The limitation of having only four different base components (nucleotides) in standard nucleic acids restricts their functions and potential, as compared to the 20 different amino acids in natural proteins. The UBPs and cells stably incorporating the UBPs described herein offer numerous advantages and can be applied to a broad range of biotechnologies. Having a third base pair that replicates in the ranges of natural ones could significantly increase the information density in the same length of DNA and make DNA a more attractive alternative for information storage. More importantly, a third UBP could expand coding of DNA for unnatural amino acids, providing access to proteins (including protein therapeutics) with unique properties not available using the 20 natural ones.