Nucleic acid sequencing technology has experienced rapid and massive advances over recent years. As compared to gel based separation methods where nested sets of terminated sequence extension products were interpreted visually by scientists, today's sequencing technologies produce enormous amounts of sequence data, allow illumination of never before sequenced genomes and genome regions, and provide throughput and costs that allow the widespread adoption of sequencing into routine biological research and diagnostics.
Genomic sequencing can be used to obtain information in a wide variety of biomedical contexts, including diagnostics, prognostics, biotechnology, and forensic biology. Sequencing may involve basic methods including Maxam-Gilbert sequencing and chain-termination methods, or de novo sequencing methods including shotgun sequencing and bridge PCR, or next-generation methods including polony sequencing, 454 pyrosequencing, Illumina sequencing, SOLiD sequencing, Ion Torrent semiconductor sequencing, HeliScope single molecule sequencing, SMRT® sequencing, and others. For most sequencing applications, a sample such as a nucleic acid sample is processed prior to introduction to a sequencing machine. A sample may be processed, for example, by amplification or by attaching a unique identifier. Often unique identifiers are used to identify the origin of a particular sample.
Despite the huge advances in sequencing technology, or perhaps illuminated by such huge advances, there exists a need to be able to create broad, diverse and representative sequencing libraries from samples of nucleic acids. Further, as the applications of sequencing technologies expands, the needs for these library preparation methods to address widely divergent sample types also increases. For example, the ability to uniformly interrogate the entire genome, or at least the entire portion of the genome that is of interest is a significant source of difficulty for molecular biologists. The lack of uniformity emanates from numerous process inputs into all of the various sequencing technologies. For example, fragment size biases may make it more likely that a sequencing technology will sequence only short fragments of the genome. Likewise, specific sequence context may increase or decrease the likelihood that portions of the genome will not be primed and sequenced, or amplified in pre-sequencing steps, leading to uneven sequence coverage in the resulting sequence data. Finally, a host of other characteristics of the sequences, e.g., secondary or tertiary structures, or the sequencing technologies, e.g., long read vs. short read technologies, can lead to biased representation of the originating sequence within a sequencing library.
With these challenges, the process of converting sample nucleic acids into sequenceable libraries has taken on significant complexity and time commitments, e.g., in fragmentation, separation, amplification, incorporation of sequencer specific library components, and clean up. Methods and systems are provided herein for preparing improved sequencing libraries, as well as the libraries prepared, that have additional benefits of simplified workflows.