With the recent advances in bioscience, high-throughput, highly parallel DNA synthesis and analysis technologies have also gained more importance. Since the beginning of the twentieth century, high-throughput, highly parallel DNA analysis has been driven by advances in next-generation sequencing technology. The development of new assays has greatly reduced the time required for analysis and increased the amount of analyzable data. Next-generation sequencing assays, such as Illumina, Roche-454, and Ion-Torrent-based sequencing assays, are currently in use. According to these assays, each target DNA library is attached to a solid and sequencing is performed based on chemical reactions at the site. With the recent remarkable development and growing application range of gene synthesis technologies, development of high-throughput, highly parallel gene synthesis technologies has been of increasing importance. An essential requirement for highly parallel gene synthesis is to synthesize error-free nucleic acid fragments in a cost effective manner. In conventional gene synthesis methods, chemically synthesized nucleic acid molecules are utilized for gene synthesis without further processing. However, since nucleic acid molecules are chemically synthesized in yields below 100%, error-free nucleic acid molecules and nucleic acid molecules containing synthesizing errors exist in the form of mixtures. Likewise, error-free genes and error-containing genes coexist in genes composed of chemically synthesized nucleic acid molecules. Thus, labor-intensive cloning and Sanger sequencing procedures are required for the selection of error-free genes.
Recent next-generation sequencing assays allow for cost-effective sequencing of millions of nucleic acid molecules at one time. Next-generation sequencing assays of nucleic acid fragments provide only information on the sequences of the nucleic acid fragments, and retrieval of the nucleic acid fragments after sequencing is very troublesome. Several methods for retrieving desired nucleic acid fragments after next-generation sequencing have recently been developed. According to the first method, information on each well of an analytical plate and information on the analyzed sequences are mapped after next-generation sequencing, beads attached with desired nucleic acid fragments remaining on the plate are picked, followed by amplification (High-fidelity gene synthesis by retrieval of sequence-verified DNA identified using high-throughput pyrosequencing, 2010 Nature Biotechnology, Mark Matzas et al.). According to the second method, organism-derived or artificially synthesized DNA libraries are tagged with barcode sequences, a portion of the DNA pool is analyzed by next-generation sequencing (‘Shotgun DNA synthesis’ for the high-throughput construction of large DNA molecules, 2012 Nucleic Acids Research, Kim et al., Accurate gene synthesis with tag-directed retrieval of sequence-verified DNA molecules, 2012, Schwartz J J et al.), and desired nucleic acid fragments are selectively amplified from the remaining DNA pool using primers including barcode sequences. These methods have an advantage in that sequence-verified nucleic acid fragments identified by next-generation sequencing can be selectively retrieved.
However, the first method requires the use of a costly system to directly retrieve sequence-verified oligonucleotide from a plate and is applied only to 454 sequencing platforms, resulting in poor versatility. The second method has many limitations in retrieving desired nucleic acid fragments using barcode sequences owing to a very large population of organism-derived or artificially synthesized DNA libraries. For example, when it is intended to synthesize hundreds of genes simultaneously, hundreds of millions to tens of billions of kinds of nucleic acid fragments are included in the pool irrespective of whether the nucleic acid fragments are error-free or not. Only one kind of desired individual nucleic acid fragment selected by an experimenter is difficult to selectively amplify from the pool due to the vast population of libraries and its retrieval yield is also low. And owing to the large population of the pool like synthesized DNA libraries, there could be plurality of nucleic acid fragments having similar barcode sequences each other. Therefore undesired nucleic acids having barcode sequences similar to the barcode sequences of the target nucleic acids are also retrieved.