High sensitivity detection of microorganisms and in particular of viruses has been a challenge in the field of biological molecule analysis, in particular when aimed at detection of a plurality of microorganisms. Whether for pathological examination or for fundamental biology studies, several methods are commonly used for the detection of various classes of microorganisms.
In particular, researchers employ numerous approaches for viral detection and discovery, including metagenomic sequencing [ref. 5], microarrays [ref. 9, 10, 12, 13, 18, 21, 24 and 25], or multiplex degenerate PCR followed by other methods of characterization such as mass spectrometry [ref. 4, 11, 22] or amplicon sequencing [ref. 8].
PCR is a rapid and cost effective technique suitable for the purpose. In some applications, however, PCR relies on specific primer sets, and it does not scale well for detection of many diverse targets, and in particular of many diverse microorganisms.
In case of viruses, high levels of diversity and lack of any universally conserved nucleotide or amino acid sequence regions makes it a challenge to detect well-characterized viruses and extremely difficult to discover novel viruses by methods that involve sequence-specific amplification. In contrast, bacteria contain universally conserved 16s rRNA sequences from which conserved primers may be designed, allowing amplification of rRNA coding regions from novel, unsequenced bacteria. [ref. 3, 26] Bacterial rRNA sequences vary sufficiently for discrimination to the family, genus, and sometimes even the species level.
As a consequence, primer design for detection and/or identification of known or unknown microorganisms, especially viruses, can be challenging. Nevertheless, PCR-based techniques for viral amplification and identification are common at the species or strain level. [ref. 6, 7, 19, 27]. These rely on careful primer design from the most conserved regions available, taking advantage of degenerate primers and/or replacement of variable positions with inosine bases, and the design and multiplex optimization of minimal sets of signatures which must be used in combination to ensure detection of all known variants.
However, multiplex primer design for many highly divergent targets is challenging, since usually no universally conserved primers exist to amplify fragments from all targets, and finding sets of primers likely to function well in multiplex, adds to the complexity of finding conserved primer candidates.
Furthermore, currently available multiplex/degenerate primer prediction tools require multiple sequence alignment [ref. 34, 36-38, 41]. Multiple sequence alignments are often difficult to construct for many sequences, exhausting either memory or available time, or both, before an alignment is completed. Moreover, even if an alignment does complete, for some divergent target sets such as RNA virus genomes of a single species or gene homologues across species, in some cases, alignments may be of suboptimal quality, or there is so little nucleotide sequence conservation that multiple primers, possibly with degenerate positions, are required to amplify all targets.
In particular, for many organisms there are few or no conserved regions of sufficient size across all strains of a species for a pair of traditional-length primers (at least 18 bases), particularly in important single-stranded RNA viruses including influenza A, HIV1, ebola, and foot and mouth disease viruses. PCR-based specific (non-random) amplification across all viral families using typical 18+ base primers would require many thousands of PCR primers to span known viral sequence diversity.
Additionally, with PCR-based specific amplification approaches, which require specific primers for each species or strain, discovery or detection of unanticipated species is usually unlikely.