Next Generation Sequencing (NGS) has proved to be an invaluable tool in the diagnosis and treatment of numerous diseases, including cancer (Dancey, et al. Cell, 48:409-420 (2012); Dawson, et al. NEJM, 368:1199-1209 (2013)), cardiomyopathy (Meder, et al. Circ. Cardiovasc. Genet., 4:110-122 (2011); Norton, et al. Curr. Opin. Cariol., 27:214-20 (2012)), inherited disorders (Boycott, et al. Nature Genetics, 14:681-691 (2013)), prenatal screening (Nepomnyashchaya, et al. Clin Chem Lab Med., 51:1141-54 (2013); Papgeorgiou, et al. Genome Medicine, 4:46 (2012)), and neurological disorders (Nemeth, et al. Brain, 136:3106-180 (2013)). However, although NGS enables the sequencing of entire human genomes within days, the cost of sequencing and the burden of data analysis severely inhibit the translation of whole genome sequencing to the clinic. As a result, enrichment of target sequences is desirable to facilitate molecular diagnostics that rely on NGS (Agilent, (Santa Clara, Calif.), Roche/NimbleGen (Madison, Wis.), Illumina (San Diego, Calif.), Life Technologies (Grand Island, N.Y.)), multiplex PCR (Life Technologies, Illumina, Qiagen (Valencia, Calif.), Kailos Genetics (Huntsville, Ala.)), molecular inversion probes (Hiatt, et al. Genome Res., 23, 843-54 (2013)), highly-parallel PCR (Fluidigm (San Francisco, Calif.), Raindance (Billerica, Mass.)), and single primer amplification methods (Enzymatics/ArcherDx (Beverly, Mass.), NuGen (San Carlos, Calif.)).
Current methods for enrichment include hybridization capture from prepared DNA libraries (Albert, et al. Nature Methods, 4:903-905 (2007); Okou, et al. Nature Methods, 4:907-909 (2007)). Hybridization capture requires an array of immobilized probes. In theory, fragmented nucleic acids in solution hybridize to these immobilized probes if they have complementary sequence. These methods have the same disadvantages as for solution hybridization with the exception that both strands of a duplex can be captured. However additional disadvantages of these methods include reduced efficiency of hybridization when the probes are bound to a surface prior to hybridization. Additional disadvantages include lengthy 2-3 day protocol, multiple steps which increase the cost of the tests, a requirement for large amounts of initial input DNA (1 μg-5 μg); broad library size distribution, only 55%-65% specificity, 80%+/−200-500 base-pair (bp), and an inability to capture repeats or to handle nucleic acids containing repeat sequences within non-target sequences.
Current methods are not suited for specifying read start sites (the position at which sequencing of nucleic acid molecules begins) because of the reliance on artificial sequence at the ends of the targets. Moreover current methods are not suited for capturing both target strands. Present hybridization methods typically capture nucleic acid fragments greater than the average size of on exons, which is less than 200 bp as described by Sakharkar, et al. In Silico Biology, 4:387-393 (2004), resulting in substantially non-target sequencing, due to the inability to specifically define the read start sites. Performance comparison of hybridization-based exome capture technologies has been reviewed by Clark, et al. Nature Biotechnology, 29:908-914 (2011).
Multiplex PCR is an alternative to capture hybridization. Multiplex PCR methods are considerably faster and do not require library preparation prior to enrichment, but there is limited scalability per reaction due to primer interactions, variable uniformity of amplification across targets due to amplification bias that arises from the use of sets of primers that amplify with different efficiencies, an inability to filter duplicates, and the addition of primer sequences used to anneal to the targets are included on the ends of the amplicons. These sequences must be read through during sequencing, thereby increasing sequencing time and cost. Moreover, the sequence of the synthetic primers is contained in the sequence report in addition to target sequence generating unnecessary sequence complexity. Both molecular inversion probes and highly-parallel PCR resolve some of the issues encountered by multiplex PCR, but both methods are significantly more expensive. Molecular inversion probes require the synthesis on long oligonucleotides and there are equipment costs associated with highly-parallel PCR methods. In addition, both methods also introduce synthetic primer sequences on the ends of the amplicons. Single primer methods introduce primer sequences at only one end of the amplicon, reducing the amount of primer sequenced in half, but sacrifice the additional selectivity applied by using two primers to enrich the correct target sequence. As a result, the need remains for a method of target enrichment that minimizes the sequencing of off-target or primer regions with high scalability, specificity, and uniformity.