Long-read sequencing and long-range mapping technologies may be used to generate more accurate genome assemblies. In short, there are many genomic regions, and many kinds of structural variation, that cannot be correctly assembled from short-read paired-end sequencing data, including typical sequencing data produced by the industry leaders (such as Illumina and Thermo Fisher Life Technologies). However, many of these regions and variations can be correctly assembled using newer technologies that generate primary read lengths (or genomic maps) measured in the 10's to 100's of kilobases (kb), such as those technologies developed by Pacific Biosciences, Oxford Nanopore, 10× Genomics, Genomic Vision, Roche/Genia, and Bionano Genomics.
Although the long-read technologies are still in an early stage of development, they may be used to complement short-read sequencing capabilities. For example, DNA extraction and library preparation technologies can be used to produce long and high-quality DNA libraries. However, a gap in the marketplace exists for commercial technologies that can produce long DNA fragments. Most commercial genomic DNA extraction kits yield maximum DNA size of 20-50 kb. Since some current systems are capable of primary reads greater than 50 kb, however, existing DNA extraction kits limit the capabilities of these systems. Additionally, long DNA fragments may be used for optical mapping and synthetic long-read systems (e.g., 10× Genomics, Bionano Genomics), which require libraries generated from genomic DNA samples that are 100's of kb in length. Therefore, there is a need for automated and reproducible methods for producing extremely long (e.g., 100's to 1000's of kb in length), high quality genomic DNA samples. Despite a demonstrated need and efforts to achieve these samples, a solution has not yet been developed.
In addition, library preparation is a multi-step process that is divided into two major divisions, 1) DNA extraction from biological samples (biological fluids, particulates, cells, and tissue), and 2) library construction. As DNA sequencing becomes more useful to clinical studies, diagnostics, and therapy management, integrated workflows may streamline and automate the overall process. Again, although there is a demonstrated need, increased integration of the overall sample to library workflow has not yet been achieved.
Furthermore, next-generation sequencing (NGS) for detecting and diagnosis of infectious disease has great potential in molecular diagnostics. For example, clinical workflows using clinical samples (frequently whole blood or white blood cells (WBC's) from whole blood) may be subjected to NGS sequencing, followed by in-silico subtraction of all sequences that can be mapped to the human reference sequence. The remaining “non-human” sequences are examined for identity with any known pathogen sequences. Although this process may be successful, it requires very efficient library construction methods and deep sequencing runs, both of which can be expensive. This expense has kept the in-silico subtraction method from widespread use.