Recent advances in high-throughput, next generation sequencing technologies have enabled whole genome sequencing and new approaches to functional genomics, including comprehensive analysis of any transcriptome. One of these next generation sequencing methods involves direct sequencing of complementary DNA (cDNA) generated from messenger and structural RNAs (RNA-Seq). RNA-Seq provides several key advantages over traditional sequencing methods. First, it allows for high resolution study of all expressed transcripts, annotating the 5′ and 3′ ends and splice junctions of each transcript. Second, RNA-Seq allows for quantification of the relative number of transcripts in each cell. Third, RNA-Seq provides a way to measure and characterize RNA splicing by measuring the levels of each splice variant. Together, these advancements have provided new insights into individual cell function.
One drawback of performing standard RNA-Seq is the lack of information on the direction of transcription. Standard cDNA libraries constructed for RNA-Seq consist of randomly primed double-stranded cDNA. Non-directional ligation of adaptors containing universal priming sites prior to sequencing leads to a loss of information as to which strand was present in the original RNA template. Although strand information can be inferred in some cases by subsequent analysis, for example, by using open reading frame (ORF) information in transcripts that encode for a protein, or by assessing splice site information in eukaryotic genomes, direct information on the originating strand is highly desirable. For example, direct information on which strand was present in the original RNA sample is needed to assign the sense strand to a non-coding RNA, and when resolving overlapping transcripts.
Several methods have recently been developed for strand-specific RNA-Seq. These methods can be divided into two main classes. The first class utilizes distinct adaptors in a known orientation relative to the 5′ and 3′ end of the RNA transcript. The end result is a cDNA library where the 5′ and 3′ end of the original RNA are flanked by two distinct adaptors. A disadvantage of this method is that only the ends of the cloned molecules preserve directional information. This can be problematic for strand-specific manipulations of long clones, and can lead to loss of directional information when there is fragmentation.
The second class of strand-specific RNA-Seq methods marks one strand of either the original RNA (for example, by bisulfate treatment) or the transcribed cDNA (for example, by incorporation of modified nucleotides), followed by degradation of the unmarked strand. Strand marking by bisulfite treatment of RNA is labor intensive and requires alignment of the sequencing reads to reference genomes that have all the cytosine bases converted to thymines on one of the two strands. The analysis is further complicated due to the fact that base conversion efficiency during bisulfite treatment is imperfect, i.e. less than 100%.
Strand marking by modification of the second strand of cDNA has become the preferred approach for directional cDNA cloning and sequencing (Levin et al., 2010). However, cDNA second strand marking approaches, such as the one described in WO 2011/003630, are not sufficient to preserve directionality information when using conventional blunt-end ligation and cDNA library construction strategies with duplex adaptors, where two universal sequencing sites are introduced by two separate adaptors. The marking approach described in WO 2011/000360 utilizes a four-step process, consisting of 1) incorporation of a cleavable nucleotide into one strand of the cDNA insert, 2) end repair of the cDNA insert, 3) non-directional ligation of adaptors containing universal sequencing sites and 4) selective hydrolysis of library fragments with undesired adaptor orientation. To preserve directionality information, the method requires that the 5′ and 3′ ends of the strand selected for amplification are marked differentially, which can be achieved, for example, by ligation of directional (i.e. polarity-specific) adaptors, or by use of a specialized forked adaptor where each strand of a double-stranded polynucleotide is covalently attached to two distinct universal sequencing sites, one sequencing site at each end of the strand. Application of the methodology described in WO 2011/000360 does not result in directional sequencing libraries when using conventional duplex adaptors because the marked strand, i.e. the strand with incorporated cleavable nucleotides, is not differentially labeled at its 5′ and 3′ ends.
There is a need for improved methods for directional cDNA sequencing from cDNA libraries constructed with conventional duplex adaptors. The invention described herein fulfills this need.