1. Field of the Invention
The present invention relates to methods and systems for synthesizing polynucleotides and, in particular, to methods and systems for automatically determining synthesis designs that minimize incorrect results.
2. Background Information
While it is currently possible to achieve in vitro synthesis of genes and other long polynucleotides, the time and effort required for preparing these polynucleotides causes bottlenecks in current genetic research. With the recent genome mapping achieved by the Human Genome Project, the desire for such polynucleotides has increased dramatically. There is thus an increased desire to synthesize sequences of genes rather than generating sequences using traditional molecular biology methods for modifying or cloning genetic sequences. The efficient and cost-effective synthesis of genes is of paramount interest in projects that can take otherwise take months to set up. Moreover, certain experiments become more feasible, such as those in which a researcher can cost effectively and quickly test multiple variants of a gene to find one that works for the desired purpose. In addition, some gene sequences are far too long to synthesize using traditional techniques.
The following definitions and conventions for referring to gene synthesis terminology are used herein to facilitate description. Additional background information is available in Molecular Biology of the Cell, Third Edition. Alberts, et.al., Garland Publishing, Inc., New York and London, 1994, which is herein incorporated by reference in its entirety. A polynucleotide is formed from a plurality of nucleotides, each nucleotide formed from a phosphate, a sugar (deoxyribose), and an organic base—where the base is selected from the purine bases adenine (“A”) and guanine (“G”) and the pyrimidine bases thymidine (“T”) and cytosine (“C”). Because the sugar is deoxyribose, the nucleotides may be referred to deoxyribonucleotides, and the polynucleotide may be referred to as a deoxyribonucleic acid molecule, or DNA. A DNA molecule may be single-stranded or may hybridize to (also referred to as “anneal to” or “anneal with”) another DNA molecule to form a double-stranded structure, where the double-stranded structure may also be referred to as a DNA molecule.
The term “gene” as used herein refers to a double-stranded DNA molecule, where the two strands may be distinguished from one another by referring to one strand as the upper or top strand and the other strand as the lower or bottom strand. Thus, either of the terms “gene” or “DNA molecule” may be used to denote any long, double-stranded polynucleotide, where a long polynucleotide contains at least 100 nucleotides. In simplified explanation, gene synthesis is the production of double-stranded DNA molecules from, typically, a text specification of a single-stranded target sequence. Each single-stranded DNA molecule (and thus nucleotide sequence) has two ends (termini), and is associated with a direction (or “handedness”). One of the two termini is referred to as the 3′ end while the other termini is referred to as the 5′ end. A terminal hydroxyl group of a sugar is normally located at the 3′ end of the molecule, and a terminal phosphate group is normally located at the 5′ end of the molecule. A single strand sequence of DNA is also referred to as a “polynucleotide sequence” or, if the sequence is less than 100 nucleotides, then the polynucleotide sequence may optionally be referred to as an “oligonucleotide” or “oligo.” A textual representation of a 25-base pair double-stranded DNA molecule is written in this description as:                5′ CCTGAGAGGACAGTCAATCACAGGA 3′ (top strand) (SEQ ID NO:1)        3′ GGACTGTCCTGTCAGTTAGTGTCCT 5′ (bottom strand) (SEQ ID NO:2)        
Under appropriate conditions, a base in a single-stranded DNA sequence typically forms hydrogen bonds to a second base in another single-stranded DNA sequence when the pair of bases (one from each strand) are complements of each other. (Specifically, an “A” bonds with a “T” and a “C” bonds with a “G.”) Thus a single strand of DNA binds to another strand of DNA to form a double-stranded DNA molecule when each pair of bases of the sequences (one from each strand) is exactly complementary. A double-stranded sequence of DNA is known as a “fragment” or “DNA fragment.” Thus, a DNA fragment is a duplex polymer formed from deoxyribonucleic acid molecules. A DNA fragment may be entirely double-stranded, or may contain both single-stranded and double-stranded regions. The DNA fragments may be, for example, portions of a gene or fragments that when joined form a larger portion of the gene or the entire gene.
Hybridization (annealing) refers to this process of joining of oligos or polynucleotides (both as single-stranded DNA sequences) to generate fragments (double-stranded DNA molecules), and is written herein as:                5′ GGACAGTCAA 3′+5′ TTGACTGTCC 3′→(SEQ ID NO:3) (SEQ ID NO:4)                    5′ GGACAGTCAA 3′ (SEQ ID NO:3)            3′ CCTGTCAGTT 5′ (SEQ ID NO:4)Typically, hybridization of single-stranded DNA sequences occurs at low temperature conditions. As the temperature is raised, the double-stranded structure “melts” to form two non-hybridized single-stranded structures.                        
Once annealed, the resulting double-stranded DNA sequence (or molecule) may have an overhang on one (left or right) or on both sides. For example, a double-stranded DNA sequence with a 5′ overhang on the left can be denoted as:                5′CCTGAGAGGACAGTCAATCACAGGA 3′ (SEQ ID NO:5)        3′ TGTCCTGTCAGTTAGTGTCCT 5′ (SEQ ID NO:6)Or, for example, a double-stranded DNA sequence with a 5′ overhang on both the left and right can be denoted as:        5′CCTGAGAGGACAGTCAATCAC 3′ (SEQ ID NO:7)        3′ TGTCCTGTCAGTTAGTGTCCT 5′ (SEQ ID NO:6)Other double-stranded DNA sequences may be formed with 3′ overhangs or combinations of 3′ and 5′ overhangs, or even with no overhangs in which case the ends are referred to as “blunt” ends. The overhang length refers to the number of bases that form the particular overhang. So, for example, in either DNA sequence represented above, the overhang length of the left end is 4 (bases).        
DNA fragments can be further combined through a process referred to as ligation to form longer length fragments, which can ultimately constitute a gene. Ligation is a higher temperature process whereby, with the assistance of an enzyme (referred to as a ligase), DNA fragments in solution are covalently joined together. The presence of overhanging ends facilitates this process, and ends that are reverse complements of each other tend to bond. For example, fragments:                5′ CCTGAGAGGAC 3′ (SEQ ID NO:8)        3′ TCTCCTGTCAG 5′ (SEQ ID NO:9)and        5′ AGTCAATCAC 3′ (SEQ ID NO:10)        3′ TTAGTGGGAC 5′ (SEQ ID NO:11)can be ligated to produce the new fragment:        5′CCTGAGAGGACAGTCAATCAC 3′ (SEQ ID NO:7)        3′ TCTCCTGTCAGTTAGTGGGAC 5′ (SEQ ID NO:12)        
According to the standard method of gene synthesis, which may be referred to as the shotgun method of gene synthesis, multiple single-stranded polynucleotide sequences are mixed together in order that they can undergo hybridization to form fragments of (double-stranded) sequences. These fragments have overhangs that are designed to be exactly complementary to overhangs present in other fragments, where the exactly complementary overhangs direct two fragments to hybridize together. The hybridized fragments are then exposed to a ligase to achieve ligation between the adjacent fragments. All of this occurs in a single biochemical reaction. Since many fragments are combined in a single ligation reaction, errors are common and may result in incorrect or undesired synthetic gene sequences.