(1) Field of the Invention
This invention is in the field of nucleic acid chemistry, more specifically the synthesis of nucleic acids, more specifically to the synthesis of large DNA molecules, and more specifically to processes and procedures that create large (more than 1000 base pairs) double-stranded DNA via the assembly of large numbers of short single stranded DNA molecules having preselected sequences.
(2) Description of Related Art
Synthetic biology needs processes to enable low-cost and rapid assembly of many synthetic DNA fragments into large DNA assemblies. For example, in 2012, DARPA issued a small business grant solicitation seeking technology to assemble single-stranded synthetic fragments to give 20,000 bp ML-DNA constructs. A short while earlier, the Army Research Office issued a small business grant solicitation seeking companies to design software to allow 30,000 base pairs of single stranded DNA self-assemble to form nanostructures.
Unfortunately, the realities behind the biophysics of DNA make these goals fanciful, if the attempt is made with standard DNA. With just four nucleotides, the information density of standard DNA is too low to allow (without exquisite design) more than ca. a dozen single strands to self-assemble upon simple mixing. With more fragments containing only natural nucleotides, the vagaries of “strong” and “weak” G:C and A:T pairs, hairpins, off-target Watson-Crick hybridization, and non-Watson Crick interactions (e.g wobble and major groove binding) defeat self-assembly. These can be illustrated by mentioning the following problems:
Problem (A).
Different DNA base pairs do not contribute uniformly to duplex stability. The largest source of this non-uniformity in strand hybridization is a feature of standard DNA that joins A:T pairs by just two hydrogen bonds and G:C pairs by three. Thus, A:T pairs contribute to duplex stability consistently less than G:C pairs. This makes it challenging to design DNA fragments with different nucleotide compositions that hybridize to their complements with the same affinity.
Problem (B).
DNA strands can interact in ways outside of those specified by the canonical Watson-Crick pair. In addition to wobble pairing (e.g. G:T pairs), DNA can form major groove interactions (e.g. G-quartets). These, illustrated in FIG. 1 and FIG. 2, can (in appropriate contexts) be stronger than Watson-Crick pairing and can defeat pairing between large numbers of single stranded DNA molecules designed solely by applying Watson-Crick rules.
Problem (C).
Intra-strand folding can defeat desired inter-strand interactions needed for hybridization, primer extension, and ligation. Hairpin structures formed by a single strand, for example, can easily disrupt inter-strand hybridization that intended for a multi-strand assembly (FIG. 3). The easy accessibility of hairpins can be illustrated by some simple mathematics. The 5′-nucleotide of a standard DNA molecule must be G, A, T, or C. Whatever it is, it can find a complementary C, T, A, or G (respectively) with a one-in-four probability at each base farther into the sequence. Within a random sequence 64 nucleotides in length, the final one, two, and three nucleotides will find perfect complements 16 times, 4 times, and once within that sequence, on average. These will form hairpins with stems that are joined by one, two, and three perfect base pairs respectively. Stems with four or five pairs and loops of 2-5 nucleotides are adequate to disrupt hybridization. Therefore, loops must be avoided by design, and this design becomes difficult to manage as the number of fragments increase.
Problem (D):
Even if DNA had access only to Watson-Crickery, even if all nucleobase pairs contributed equally to duplex stability, and even if single strands never folded by themselves, the autonomous self-assembly problem would still not be trivial. With only four nucleotide letters to encode information, the information density of natural DNA is low. For a bacterial sized genome having a random sequence, all 10mers are present once. Overlapping 10mers are more than adequate to support ligation, even if they include one or two mismatches, at temperatures when typical ligases operate. This low information density makes it essentially impossible to do reliable self-assembly from any more than a dozen or so fragments. Each complement is present at low concentrations, making the rates at which they find each other low a priori. The rate of hybridization is slowed as GACT DNA fragments find “off target” GACT fragments, bind to them, and dwell for a time before dissociating to seek their “on target” fragments.
Given these realities of the chemical structure of natural DNA, it is hardly surprising that Nature rarely does what synthetic biologists want to do: Large-scale assembly by way of the hybridization of multiple single stranded fragments. Non-uniformity in the binding of sequences of natural nucleotides make it essentially impossible to assemble by autonomous hybridization of thousands (or more) nucleobase pairs, even if the primary products have no errors at all. Therefore, in natural biology, large-scale DNA assemblies are carried forward carefully from generation to generation, with strand displacement at the core of polymerization and specifically targeted ligation events that do not allow the DNA to wander into multiple single strands.
Thus, most large synthetic DNA (LS-DNA) molecules today are obtained via the “Gibson method” [Gibson 2011], rather than the spontaneous self-assembly of many single DNA strands prepared by synthesis. The Gibson method reproduces in vitro the natural Szostak process for recombination in vivo [Szostak et al. 1983]. It starts with pre-annealed duplexes, cuts them back with a 3′-exonuclease to generate sticky ends (without cutting back so far as to disrupt the duplex) and then uses the resulting sticky ends to assemble the duplexes with overhangs. Expert intervention is required at many steps in the process, creating costs.
While [Gibson 2011] speaks of single strand assembly, including single stranded assembly in yeast cells [Gibson 2009], they teach that to “ensure that error-free molecules are obtained at a reasonable efficiency, only eight to twelve 60-base oligos are assembled at one time” [Gibson 2011]. This teaching, we presume, reflects the problems listed above, which are deeply embedded in the molecular structure of natural DNA. These drive the need for inventive processes to allow LS-DNA assembly from multiple single stranded synthetic DNA fragments.