This invention pertains generally to the field of biology and particularly to techniques and apparatus for the manufacture of DNA molecules of defined or desired sequences. The manufacture of DNA molecules also makes possible the synthesis of any desired peptides, proteins or assemblies of proteins and nucleic acids as may be desired.
Using the techniques of recombinant DNA chemistry, it is now common for DNA sequences to be replicated and amplified from nature and for those sequences to then be disassembled into component parts which are then recombined or reassembled into new DNA sequences. While it is now both possible and common for short DNA sequences, referred to a oligonucleotides, to be directly synthesized from individual nucleosides, it has been thought to be generally impractical to directly construct large segments or assemblies of DNA sequences larger than about 400 base pairs. As a consequence, larger segments of DNA are generally constructed from component parts and segments which can be purchased, cloned or synthesized individually and then assembled into the DNA molecule desired.
For example, if an expression vector is desired to express a new protein in a selected host, the scientist can often purchase a generic expression vector from a molecular biology supply company and then clone or synthesize the protein coding region for the gene sought to be expressed. The coding region must be ligated into the vector in such a manner and in the correct location and orientation such that the vector will be effective to express the desired protein in the host. The purchaser of the vector must also examine the sequence of the vector to make sure no other DNA component of the vector has other properties that might be detrimental to the experiment the purchaser wishes to run. Thus, the difficulty in constructing any new desired larger DNA construct is dependent on what similar constructs, or what components of the construct, can be purchased or obtained from public sources, and how much information is available about the sequences of those components.
A novel methodology to construct and assemble newly designed DNA sequences of indefinite length has been developed based on the use of DNA constructed in DNA microarrays. A DNA microarray is made up of a plurality of sets of single stranded DNA probes arranged on a substrate. The sets of probes are identical in nucleotide sequence but different in sequence from other sets of probes. A technique has been described for the in situ synthesis of DNA microarrays that is adapted for the manufacturing of customized arrays. Published PCT patent application WO99/42813 and U.S. Pat. No. 6,375,903 describe a method for making such arrays in which the light is selectively directed to the array being synthesized by a high density micromirror array under software control from a computer. Since the micromirror array is operated totally under software control, the making of complex and expensive photolithographic masks is avoided in its entirety. It has been previously proposed that such custom microarrays can be used to provide the single stranded DNA segments necessary and sufficient to assemble double stranded DNA molecules of indeterminate length. In PCT published patent application WO 02/095073, the disclosure of which is hereby incorporated by reference, this process is set forth. In short, using that approach, short segments of single stranded DNA are made on the microarray and designed such that a portion of each probe is complementary to two other oligonucleotides in another set on the array. In theory then, when the oligonucleotides are released from the substrate of the array, the DNA segments will self-assemble into the complete desired DNA molecule as each complementary segment hybridizes to its complement.
A complexity arises from this general approach to DNA synthesis that no synthetic or biochemical processes are ever completely efficient and accurate. Thus it is inevitable that there will be occasional deletion and substitution errors in the DNA segments made by this process. To facilitate the practical synthesis of longer DNA molecules on interest and of good quality, methods must be developed to purify the DNA sequences of interest from those artifacts that arise through various sorts of errors and inefficiencies in the probe synthesis and assembly process.
One process for error correction has previously been proposed, a process referred to as coincidence filtering. That process is optimized for the detection and removal of rare single base pair errors in long DNA sequences.