A typical sequencing workflow proceeds via the following steps: a) fragmenting the target DNA to make fragments; b) ligating adaptors to the fragments; c) amplifying the adaptor-ligated fragments, and d) sequencing the amplification products. In some methods, the fragmenting and ligating steps are mediated by a transposase (see, e.g., Caruccio, Methods Mol Biol. 2011; 733:241-55). In other methods, the fragmenting is done mechanically (e.g., by sonication or shearing) or using a double stranded DNA “dsDNA” fragmentase enzyme. In these methods (e.g., the mechanical or fragmentase methods), after the DNA is fragmented, the ends are polished and ligated to the adaptor.
Methods that rely on mechanical fragmentation and enzymes that generate dsDNA breaks (e.g. NEBNext dsDNA Fragmentase, New England Biolabs) require additional steps to prepare the fragments for ligation. For example, in both cases the ends of the fragments need to be polished using a polymerase, and in some cases, the polished fragments need to be treatment with a kinase to ensure that all of the 5′ ends are phosphorylated. In addition, in many protocols the fragments are dA-tailed before ligation. These additional steps add significant technical complexity to the protocol.
Current fragmentation methods that rely on an enzyme (e.g., transposase or dsDNA fragmentase) are also limited because they are highly sensitive to the ratio of enzyme to DNA in the reaction. If too much DNA is included then the template is poorly fragmented. If too little DNA is included then the template is overly fragmented. This means that the amount template DNA must be carefully quantified before fragmentation and the reaction time and amount of enzyme should be carefully controlled to obtain a desired level of fragmentation. Finally, enzyme-based reactions are driven by an excess of enzyme and need to be quickly stopped to prevent excess fragmentation. This often involves addition of a stop buffer, which can adversely affect downstream processing steps or require extra cleanup steps. Further, when several samples are being processed, it can be difficult to ensure that fragmentation reaction time is uniform across the samples.
Moreover, transposase-mediated methods also have a bias for certain insertion sites (see, e.g., Green et al Mobile DNA 2012, 3:3) and, as such, the fragmentation is not truly random. Transposase-mediated methods also rely on conserved ‘arm’ sequences that contain sequence elements that are required for transposon-arm complex formation and the fragmentation/ligation reaction. This makes the adaptor sequences difficult to customize. While it is in theory possible to add molecular barcodes and the like 5′ of the arm sequences (outside of the transposase binding sites), this modification adds sequence freight to the resultant sequence reads, effectively shortening the length of the sequence reads.
Peters et al (Nature. 2012 487:190-5) developed a protocol called Controlled Random Enzymatic Fragmentation (CoRE) to randomly fragment DNA. This protocol involves incorporating dUTP into a product using a DNA polymerase, and then treating the product (which contains uracil) with uracil DNA glycosylase (UDG) and an endonuclease (endonuclease IV) to fragment the product. The ratio of dUTP/dTTP controls the average fragment length after endonuclease treatment. However, the Peters protocol requires multiple enzymatic steps, including treatment by: (1) shrimp alkaline phosphatase (SAP), (2) UDG, (3) EndolV, (4) E. coli DNA polymerase I, (5) SAP and (6) ligase. Moreover, the ability of a polymerase to incorporate dUTP varies greatly from polymerase to polymerase and, indeed, there are many polymerases, including proofreading DNA polymerases, that cannot incorporate dUTP. Thus, Peter's method has limited utility.
Drmanac (US2011033854) describes a multi-step method for turning a “long” fragment into a population of smaller, blunt ended, fragments that can be ligated to adaptors, amplified and sequenced. The Drmanac method involves replacing nucleotides with nucleotide analogs, creating “gapped” nucleotides (i.e., nucleic acids that contain nicks or gaps), and nick translating (i.e., copying) the gaps using the other strand as a template, to produce blunt-ended fragments. In some cases, Drmanac performs his method by incorporating dUTP and dmCTP into the “long” fragment using a polymerase, deglycosylating the uridines (using UDG), removing the deglycosylated nucleotides (using an endonuclease), making double stranded breaks near the mCTPs (using McrBC), and then nick translating the digested product using a strand-displacing polymerase to blunt-ended fragments (see FIG. 2 and para. 82). Next, adaptors are ligated and the fragments can be amplified and sequenced. Variations of Drmanac's method include treatments by a kinase and/or an alkaline phosphatase, as well as polishing and A-tailing.
In addition to being difficult to implement because it is a technically complex, multi-step process, Drmanac's method also suffers from a number of acute problems, particularly for many sequencing applications.
For example, the Drmanac the method is complicated by the fact that while the nick translation step can, in theory, be done using a proofreading polymerase (i.e., the type that would be more suitable for sequencing applications because they have a higher fidelity, e.g., Pfu), the use of such a polymerase would lead to degradation of non-protected 3′ overhangs. This will result in problems with ligation and, in addition, would lead to products that can self-ligate (e.g., circularize). Drmanac points out that one could alternatively use a non-proofreading polymerase (e.g., Taq). However, as would be apparent, the use of a non-proofreading polymerase can lead to a product that contains lots of errors. Further, with respect to the use of non-proofreading polymerases in his method, Drmanac himself states that the process can be hard to control, and will often generate a mixed population of 3′ ends, resulting in a low adaptor-to-insert ligation yield. Drmanac proposes to solve some of these problems using special adaptors, ddNTPs, A-tailing, and/or an alkaline phosphatase treatment, effectively adding even more steps to an already complicated method.
In addition, many, if not most polymerases (including thermostable polymerases with the highest fidelity) either do not incorporate dUTP efficiently or do not recognize templates that contain dUTP, meaning that the initial step of Drmanac's method (i.e., producing a “long” fragment containing uridine and methylcytosine), can only be implemented using a restricted number of polymerase, many of which have a high error rate.
Finally, McrBC is a rather unusual restriction enzyme in that it cuts at an unspecified, variable, position between two methylated cytosines, and the type of ends that it produces are unknown and may vary from site to site. As such, in order to implement Drmanac's method, those sites would have to polished by filling in the 5′ overhangs and chewing back the 3′ overhangs, which further complicates Drmanac's method and, in addition, would lead to “gaps” in the sequence. Also, McrBC optimally cleaves when the spacing between adjacent methylcytosines cytosines is relatively close together (i.e., from 55 bp to 103 bp). This makes McrBC less than efficient for producing fragments for most sequencing applications (which generally require longer fragments).
The method described herein lacks at least some of the deficiencies listed above and, depending on how the method is implemented, can have other advantages as described below.