1. Field of the Invention
The present embodiments relate to the design and implementation of methods and processes for clonal amplification of large DNA molecules by multiple displacement amplification using bar-coded primers for sequencing and assembly of complex genomes, polyploid genomes and large segments for metagenome samples.
2. Description of the Related Art
Sequences obtained from overlaping long DNA molecules (10 kb or larger) are useful for assembly of complex genomes that contain large number of repetitive sequences and homologues chromosomes in diploid as well as polyploid genomes. In addition, sequences of long DNA molecules from metagenome DNA sample will be useful for identification of full length genes and even metabolic pathways to facilitate analysis of complex microbial communities. Hence, it is important to develop technologies for sequencing long DNA molecules.
Reads produced by the second generation short-read sequencing technologies are typically from 100 to 500 bps. So far, reads from the third generation sequencing platforms could only reach up to 3 kb. Although short reads derived from both new generations of sequencing platforms can be used to assemble contigs and even entire genome, there are a number of limitations associated with these technologies. For example, short reads are unable to resolve repeats, which are the major obstacles for assembly of complex genome. When genomes are assembled by using overlapping short reads, haplotype genetic information cannot be resolved. In metagenome, due to the high complexity and low sequence coverage, short reads are often unable to overlap and hence cannot be assembled. This makes it difficult to indentify full length genes and metabolic pathways from microbial community.
Current method for sequencing of DNA longer than 3 kb requires construction of plasmid, fosmid or BAC libraries. Briefly, DNA of 2-200 kb are ligated with cloning vectors and transformed into E. coli. Insert DNA are propagated inside of E. coli. DNA insert from each clone is sequenced by Sanger. Sequencing reads are assembled by using overlapping reads to obtain the sequence for the original insert DNA templates. Different clones can be pooled and sequenced by Illumina (or other second generation short read sequencing platforms). Due to the low complexity in such pooled library, sequences from different clones will not overlap. Only sequences from the same template may overlap and can be assembled into contigs. Because the large capacity of second generation sequencers, multiple pools of clones can be converted into sequencing libraries using indexed adapters or linkers and sequenced together. In this way, large number of clones can be sequenced. The disadvantages of cloned library approach include time-consuming in making libraries and low throughput and high cost in generating multiple indexed libraries for sequencing on 2nd generation sequencing platforms.