Nucleic acid sequences encode the necessary information for living things to function and reproduce. Determining such sequences is therefore a tool useful in pure research into how and where organisms live, as well as in applied sciences such as drug development. In medicine, sequencing tools are used for diagnosis and to develop treatments for a variety of pathologies, including cancer, infectious disease, heart disease, autoimmune disorders, multiple sclerosis, and obesity. In industry, sequencing is used to design improved enzymatic processes and synthetic organisms. In biology, such tools are used to study the health of ecosystems, for example, and thus have a broad range of utility.
One focus of the sequencing industry has shifted to finding higher throughput and/or lower cost nucleic acid sequencing technologies, sometimes referred to as “next generation” sequencing (NGS) technologies. In making sequencing higher throughput and/or less expensive, the goal is to make the technology more accessible for sequencing. These goals can be reached through using sequencing platforms and methods that provide sample preparation for larger quantities of samples of significant complexity, sequencing larger numbers of complex samples, and/or providing a high volume of information generation and analysis in a short period of time. Various methods, such as, for example, sequencing by synthesis, sequencing by hybridization, and sequencing by ligation are evolving to meet these challenges.
Many next-generation sequencing (NGS) platforms are available for the high-throughput, massively parallel sequencing of nucleic acids. Many of these systems, such as the HiSeq and MiSeq systems produced by Illumina, use a sequencing-by-synthesis (SBS) approach, wherein a nucleotide sequence is determined using base-by-base detection and identification. Using this particular approach, identifying 1 base requires 1 cycle of the SBS chemistry process (which may involve four separate reactions separated by washes).
Currently, these technologies provide a maximum achievable read length of ˜250 bases, which can be extended to ˜400 (2×250 bases with sufficient overlap for assembly) if two high-quality paired-end reads are acquired from the same template and assembled. Each SBS cycle takes approximately 4 minutes to complete; thus, in a paired-end approach to acquire ˜400 bases of sequence information, the 500 cycles of SBS required to produce the two reads of ˜250 bases takes approximately 37 hours to complete. In addition, most of the cyclic sequencing technologies' performance and quality substantially decrease after determining ˜100 bases, introducing a degree of uncertainty associated with individual sequence reads longer than ˜100 bases and the longer sequence assemblies in which they are used. Due to these quality and time limitations of current NGS platforms, the ever-increasing demands for long, high-quality nucleotide sequences are saturating the output capabilities of the installed base of sequencing apparatuses. Consequently, technologies are needed that provide high-quality sequences of ˜500 bases or more from a much shorter sequencing run-time of several hours rather than several days.