During several decades, dideoxy DNA sequencing (“Sanger” sequencing, Sanger et al (1977), Proc Natl Acad Sci USA 74:5463-5467) has been used as the standard sequencing technology in most genetic laboratories. Throughput is limited when using Sanger sequencing, since the process is performed on single molecules of target sequence. The search for a faster, cheaper and more accurate sequencing method, which adds the advantage of high throughput, has led to the development of several new techniques currently denoted “massive parallel sequencing” (MPS) or “next-generation sequencing” (NGS).
The introduction of MPS methods have changed the paradigm of DNA sequencing (Mardis (2008), Trends Genet 24:133-141; Shendure & Ji (2008), Nat. Biotechnol 26:1135-1145). MPS methods enable the parallel processing of hundreds of thousands to millions of DNA templates in parallel, resulting in high throughput and a low cost per base of the generated sequence. The introduction of MPS methods furthermore allows the user to rapidly sequence entire complex genomes such as the human genome.
Nevertheless, large scale routine sequencing of the whole complex human genome in its entirety is not yet feasible for diagnostic use because of the cost and time are still too great. Routine sequencing of the human genome for diagnostic use requires at least 100-1000-fold coverage of each nucleotide, resulting in the need to sequence and process 300-3000 Gb of sequencing data per patient. With available MPS methods, this would require several sequencing runs/patient and cost tens of thousands of dollars. In addition to the economic burden, it would require massive data processing and data storage capabilities that would place a substantial burden on the informatics infrastructure of a genetic laboratory.
As a consequence, several methods aiming to simplify the process and enrich regions of interest for e.g. sequencing, i. e. target enrichment methods, have been developed. Target enrichment methods are used to define genomic regions that can be selectively captured and enriched from a DNA sample before sequencing. Re-sequencing only those genomic regions that are enriched is much more time- and cost-effective than whole genome sequencing, and the resulting data is considerably less cumbersome to analyze and requires less data storage capabilities.
Known target enrichment methods include molecular inversion probes (MIP), on-array- and in solution-hybrid capture and polymerase chain reaction (PCR). Each are discussed separately below.
Molecular Inversion Probes (MIP):
MIP are modified padlock probes. When a probe is hybridized to a corresponding genomic target, there is a gap at one or more nucleotide positions. A design where the probe lacks a complementary nucleotide at e.g. a single-nucleotide polymorphism (SNP) location can be used for detection and identification of the polymorphism. If the gap of the hybridized probe is more than a single nucleotide, the probes are in general termed “connector inversion probes” (CIP). The advantage of using this type of probes is that they can be used for SNP genotyping, copy number variation (CNV) analysis, and for detection of allelic imbalances. The probes can be designed such that they contain tag sequences for identification of the probe, as well as primer sequences for sequencing the target region. As compared to other target enrichment techniques, molecular inversion probes demonstrate a good specificity but may show some variability in performance between different probes within assays.
On-Array and in-Solution Hybrid Capture:
If one is interested in capturing genomic regions using in-solution capturing technique, one may use a number of oligonucleotides (probes) and hybridize them to fragmented genomic DNA in-solution (as opposed to hybrid or array based methods). The probes, which may be attached to paramagnetic beads, hybridize to the DNA of interest. Following hybridization, the beads containing probes and complementary DNA-fragments can be separated, and non-bound DNA material is removed by washing the beads. Following removal from the beads, the DNA can be sequenced using Sanger sequencing or MPS methods. One advantage of this technique as compared to array-based techniques is the improved target enrichment due to the high probe/target ratio. This may, however, have implications on the attempts to lower the costs for MPS.
Polymerase Chain Reaction (PCR):
PCR has been widely used for pre-sequencing sample preparations (Saiki et al (1988), Science 239:487-491), as it is well compatible with a traditional Sanger sequencing based approach. PCR is also compatible with any current MPS platform. However, in order to make full use of the high throughput enabled by the MPS technology, a large number of PCR amplification products (“amplicons”) must be processed and sequenced together. Multiplex PCR allows the user to generate multiple different PCR amplicons from a single PCR reaction, and is particularly useful during target enrichment for MPS. Multiplex PCR may be difficult to perform, because the simultaneous use of multiple primer pairs frequently generates a high level of non-specific amplification, caused by an interaction between primers (Cho et al (1999), Nat Genet 23:203-207; Wang et al (1998), Science 280:1077-1082). Various methods of overcoming non-specific amplification in multiplex PCR have been developed (Fredriksson et al (2007), Nucleic Acids Res 35:e47; Meuzelaar et al (2007), Nat Methods 4:835-837; Varley & Mitra (2008), Genome Res 18:1844-1850; U.S. Pat. No. 5,677,152).
For many MPS platforms, there is an upper limit to the length of DNA fragment that can be sequenced in a single run. For this reason, many target DNA fragments must be divided into shorter amplicons, each represented by a specific primer pair, in order to obtain a continuous DNA sequence after data analysis. In order to maximize throughput, it is often also necessary not to use too short PCR amplicons as these will reduce the total number of potential base reads in a sequencing run.
In order to obtain full sequencing coverage of a defined region within the human genome, amplicons may be designed to overlap using primer tiling design. In order to obtain efficient PCR amplification and optimal amplicon lengths, and to avoid extensive non-specific amplification, it is currently necessary to divide multiplex PCR reactions into several separate reactions when using overlapping primer design. In this way, the actual overlapping designs are separated into different PCR reactions. If a clinical sample needs to be divided into several different multiplex PCR reactions, the risk for sample mix-up and sample contamination is increased. The use of multiple PCR reactions for the analysis of a clinical sample can also be a problem when the amount of DNA available for the analysis is limited.
Digital PCR has been proposed as a suitable method for MPS library preparation. Digital PCR is based on clonal amplification of nucleic acids and requires highly specialized equipment in order to be efficiently performed. Digital PCR is also considered to be prone to error in the hands of inexperienced users (Sykes et al (1992), BioTechniques 13(3):444-9; Perkel (2015), BioTechniques 58 (5): 217-21, Pekin et al (2011), Lab on a Chip 11(13):2156-66). Sample preparation by digital PCR requires more available DNA for the analysis as compared to a single reaction PCR.
There is a need in the art for methods that improve multiplex PCR reactions in general. There is in particular a need for methods that address the problem of non-specific and/or unwanted amplification in multiplex PCR reactions in several stages, for example during target amplification for genetic diagnostic applications, library preparation, re-sequencing and other situations.