A genetic variant is one or more nucleotides which differ from a reference DNA sequence for a given region. For example, a genetic variant may comprise a deletion, substitution or insertion of one or more nucleotides.
A DNA sample may be analysed for known genetic variants or to discover previously unknown genetic variants in a region of interest by determining the DNA sequence in the region of interest and comparing the determined sequence to the reference sequence.
DNA sequencing can be performed using a variety of techniques, such as the classic chain termination method, or one of several high-throughput, next generation sequencing (NGS) methodologies, reviewed by Metzker, M. L., Nat Rev Genet 2010 January; 11(1): 31-46.
Illumina sequencing, 454 pyrosequencing, Heliscope single molecule sequencing, single molecule real time (SMRT) sequencing and Ion semiconductor sequencing platforms are examples of DNA sequencing methods based on the “sequencing by synthesis” principle. In these methods, the sequence of a template strand of DNA is determined through the detection of signals emitted as nucleotide bases are incorporated into a newly-synthesised complementary strand.
DNA sequencing platforms have error rates. For example, occasionally the polymerase used in the amplification reaction will incorporate the wrong nucleotide base in the complementary strand being synthesised, leading to an incorrect determination of the nucleotide at that position in the DNA template. The detection limit of NGS methods is defined by errors at two stages: library preparation (which usually involves amplification by PCR) and by sequencing itself.
This is problematic especially for the detection of genetic variants that will only be present in a DNA sample at low frequency, for example a frequency approaching or below the error rate of the sequencing method used. Under such circumstances, it is difficult or impossible to determine whether a genetic variant identified is real (i.e. actually present in the DNA template molecule) or an error.
For Illumina sequencing, the background error rate varies for different genetic variants and genomic locations and has a large variance. Therefore, detecting mutations which are present in a DNA sample at frequency of ˜1% or lower is problematic.
Existing methods of DNA sequencing and genetic variant identification have limitations with regards to the detection of rare, novel variants in multiple regions, especially in samples having small amounts of DNA.
Methods are typically incapable of identifying mutations occurring at a frequency lower than or similar to the error rate of method used (i.e. background noise).
Digital PCR (dPCR; Vogelstein B., Kinzler K. W. Proc. Natl. Acad. Sci. U.S.A. 1999 96(16):9236-41; Sykes, P. J. et al., BioTechniques 1992 13(3): 444-9) is not useful for identification of novel (i.e. previously unidentified) genetic variants, as dPCR involves use of primers and assays designed to detect particular variants. Moreover, dPCR has a limited scope for analysing multiple regions of interest in parallel, especially where DNA sample is limited.
Other complex methods exist for tagging single DNA molecules from a single pool of DNA, such as Safe-SeqS and single-molecule molecular inversion probes (Kinde I, et al., Proc Natl Acad Sci USA. 2011 108(23): 9530-5; Hiatt J B, et al., Genome Res. 2013 23(5):843-54.).
These methods are not suitable for simultaneous analysis of multiple genes (i.e. multiple regions of interest) and when DNA is limited.
Several studies have demonstrated non-invasive detection of cancer DNA (Dawson S J, et al., N Engl J Med. 2013 368(13):1199-209; Forshew T, et al., Sci Transl Med. 2012 4(136):136ra68; Murtaza M, et al. Nature. 2013 497(7447):108-12). However, major challenges persist in this field, such as (a) screening sufficient bases of the genome to detect relevant cancer mutations (b) screening of small quantities of fragmented DNA for such mutations, and (c) detection of low frequency mutant tumour DNA molecules amongst many ‘wild-type’ molecules.
For example, Forshew T, et al., Sci Transl Med. 2012 4(136):136ra68 describes screening of large regions of the genome for cancer mutations in blood, but the detection limit for this method was ˜1%-2% allele frequency (AF).