Nucleic acid hybridization has a significant role in biotechnology applications pertaining to identification, selection, and sequencing of nucleic acids. Sequencing applications with genomic nucleic acids as the target materials demand one to select nucleic acid targets of interest from a highly complex mixture. The quality of the sequencing efforts depends on the efficiency of the selection process, which, in turn, relies upon how well nucleic acid targets can be enriched relative to non-target sequences.
A variety of methods have been used to enrich for desired sequences from a complex pool of nucleic acids, such as genomic DNA or cDNA. These methods include the polymerase chain reaction (PCR), molecular inversion probes (MIPs), or sequence capture by hybrid formation (“hybrid capture;” See, for example, Mamanova, L., Coffey, A. J., Scott, C. E., Kozarewa, I., Turner, E. H., Kumar, A., Howard, E., Shendure, J. and Turner, D. J. (2010) “Target-enrichment strategies for next-generation sequencing,” Nat. Methods 7:111-118.). Hybrid capture offers advantages over other methods in that this method requires fewer enzymatic amplification or manipulation procedures of the target nucleic acid as compared to the other methods. The hybrid capture method introduces fewer errors into the final sequencing library as a result. For this reason, the hybrid capture method is a preferred method for enriching for desired sequences from a complex pool of nucleic acids and is ideal for preparing templates in next generation sequencing (NGS) applications.
The NGS applications usually involve randomly breaking long genomic DNA or cDNA into smaller fragment sizes having a size distribution of 200-500 bp in length, depending upon the NGS platform used. The DNA termini are enzymatically treated to facilitate ligation and universal DNA adaptors are ligated to the ends to provide the resultant NGS templates. The terminal adaptor sequences provide a universal site for primer hybridization so that clonal expansion of the desired DNA targets can be achieved and introduced into the automated sequencing processes used in NGS applications. The hybrid capture method is intended to reduce the complexity of the pool of random DNA fragments from, for example, from 3×109 bases (the human genome) to much smaller subsets of 104 to 108 bases that are enriched for specific sequences of interest. The efficiency of this process directly relates to the quality of capture and enrichment achieved for desired DNA sequences from the starting complex pool.
The NGS applications typically use the hybrid capture method of enrichment in the following manner. A prepared pool of NGS templates is heat denatured and mixed with a pool of capture probe oligonucleotides (“baits”). The baits are designed to hybridize to the regions of interest within the target genome and are usually 60-200 bases in length and further are modified to contain a ligand that permits subsequent capture of these probes. One common capture method incorporates a biotin group (or groups) on the baits. After hybridization is complete to form the DNA template:bait hybrids, capture is performed with a component having affinity for only the bait. For example, streptavidin-magnetic beads can be used to bind the biotin moiety of biotinylated-baits that are hybridized to the desired DNA targets from the pool of NGS templates. Washing removes unbound nucleic acids, reducing the complexity of the retained material. The retained material is then eluted from the magnetic beads and introduced into automated sequencing processes.
Though DNA hybridization with the baits can be exquisitely specific, unwanted sequences remain in the enriched pool following completion of the hybrid capture method. The largest fraction of these unwanted sequences is present due to undesired hybridization events between NGS templates having no complementarity to the baits and NGS templates that do. Two types of undesired hybridizations arising in the hybrid capture method include the following sequences: (1) highly repetitive DNA elements that are found in endogenous genomic DNA; and (2) the terminal adaptor sequences that are engineered into each of the NGS templates of the pool.
The repetitive endogenous DNA elements, such as an Alu sequence or LINE sequence, present in one DNA fragment in the complex pool can hybridize to another similar element present in another unrelated DNA fragment. These fragments, which may originally derive from very different locations within the genome, become linked during the hybridization process of the hybrid capture method. If one of these DNA fragments represents a desired fragment that contains a binding site for a bait, the unwanted fragment will be captured along with the desired fragment. This class of unwanted NGS templates can be reduced by adding an excess of the repeat elements to the hybridization reaction. Most commonly, human Cot-1 DNA is added to the hybridization reaction, which binds Alu, LINE, and other repeat sites in the target and blocks the ability of NGS templates to interact with each other on that basis.
A more problematic class of unwanted NGS templates that are recovered during hybrid capture arises from interactions between terminal adaptor sequences that are engineered on each of the NGS templates of the pool. Because the pool of NGS templates typically will contain the identical terminal adaptor sequences on every DNA fragment, the adaptor sequences are present at a very high effective concentration(s) in the hybridization solution. Consequently, unrelated NGS templates can anneal to each other through their termini, thereby resulting in a “daisy chain” of otherwise unrelated DNA fragments being linked together. So if one of these linked fragments contains a binding site for a bait, the entire daisy chain is captured. In this way, capture of a single desired fragment can bring along a large number of undesired fragments, which reduces the overall efficiency of enrichment for the desired fragment. This class of unwanted capture event can be reduced by adding an excess of single-stranded adaptor sequences to the hybridization reaction. Yet the ability to effectively reduce the so-called daisy chain capture events with an excess of adaptor sequences is limited to an efficiency of about 50%-60% for capturing the desired fragment.
In spite of the use of Cot-1 DNA and adaptor blocking oligonucleotides in the hybridization reaction, a significant amount of contaminating unwanted DNA fragments remain in the sequencing pool after the hybrid capture step, largely because the blocking methods are not completely successful. Thus, there is a need to improve capture efficiency and to reduce contamination from undesired sequences so that one can devote resources to sequencing a greater fraction of targets of interest and fewer targets that are not of interest.
Thus, off-target nucleic acid interactions can limit the efficiency of the selection of target nucleic acids by hybridization (for example, solution hybridization) to a capture probe, for example, an oligonucleotide bait. Off-target selection can result, for example, in one or more of decreased yields of hybridization capture and/or artifactual hybrid capture, which in turn lead to inefficiencies in subsequent steps, for example, sequencing.
Off-target selection is typically increased when the stringency conditions of hybrid selection are reduced, for example, when selecting for a target:capture duplex having a lower nucleic acid melting temperature (for example, DNA:DNA duplexes as compared to RNA:DNA duplexes). Thus, capture of off-target sequence can be more of a problem in DNA:DNA hybridizations.
Typically, library members include a library insert, often a segment of sequence from a gene of interest, for example, a segment for sequencing. If a member is on-target, the library insert forms a duplex with the capture probe. Typically, library members also include and one or more non-target sequences. These are typically not portions of a gene of interest but rather are adaptor sequences, amplification primers or tags, or bar code tags. The non-target sequence of the capture probe-hybridized library member, can, by duplex formation with other sequences in the reaction mixture, lead to the selection of undesired sequences, for example, off-target library members. While not wishing to be bound by theory, concatenation between an on-target library member and off-target sequences can result in selection of off-target sequences.
Methods and compositions for minimizing selection of off-target nucleic acid, for example, minimizing the selection of library members that do not from a duplex with the capture probe are disclosed herein. Methods and compositions are disclosed herein that reduce non-target sequence, for example, adaptor-mediated selection.