Sequencing of RNA viruses provides crucial insight into viral infection and evolution. However, whole genome sequencing of viruses can be particularly challenging for second-generation platforms due to genome size, structure, and the presence of large amounts of host nucleic acids. Most protocols rely on either gene specific or global RNA amplification to produce sufficient template quantities for ligation-based sequencing library preparation, a process that can potentially introduce errors interpreted as viral quasi-species or major variants. Conversely, total RNAseq, while agnostic to input, requires co-sequencing of host RNA at the cost of depth of coverage over the virus of interest.
Amplification-free sequencing of RNA genomes poses a significant challenge for many library preparation methods, as material must first be converted into double-stranded DNA among an overwhelming pool of host DNA and RNA. The lowest input methods (e.g., Illumina® Nextera XT) require one nanogram of input, or roughly 4.63×107 genome copies of a 10,000 nucleotide viral genome, assuming conversion into double-stranded cDNA is 100% efficient. Sequencing viral samples from the recent Ebola outbreak was the first published account of Nextera-based library preparation for sequencing of an RNA virus without genomic amplification, relying on depletion of host DNA and ribosomal RNA prior to random hexamer-primed cDNA synthesis. Other viral sequencing protocols utilizing transposon-mediated library preparation without genomic amplification have required a minimum of 1×1010 viral copies per mL of sample, which is unrealistic for most laboratory or clinical sample collection methods. The inefficiencies encountered in these and other protocols are most likely due to the use of inherently loss-prone nucleic acid isolation methods, such as silica columns and gel purifications, as well as the need to co-sequence non-viral host material.
Therefore there remains a need in the art for a method for limited input whole genome sequencing of RNA viruses without genomic amplification, without co-sequencing of non-viral host material, and using improved nucleic acid isolation methods, thereby eliminating potential sources of amplification-induced error and obviating the need for host ribosomal RNA depletion.