Copy number variations (CNVs) are the gains or losses of genomic regions which range from 500 bases on upwards in size (often between five thousand and five million bases). Whole genome studies have revealed the presence of large numbers of CNV regions in human and a broad range of genetic diversity among the general population. CNVs have been the focus of many recent studies because of their roles in human genetic disorders. See, for example Iafrate et al., 2004, Nat Genet 36: 949-951; Sebat et al., 2004, Science 305: 525-528; Redon et al., 2006, Nature 444: 444-454; Wong et al., 2007, Am J Hum Genet 80: 91-104; Ropers, 2007, Am J Hum Genet 81: 199-207; Lupski, 2007, Nat Genet 39: S43-S47, each of which is incorporated by reference. Aneuploidy, such as trisomy or whole chromosome deletion, is a limiting type of copy number variation associated with a variety of human diseases.
Comparative genomic hybridization (CGH) is one technique used to detect copy number changes and other genomic aberrations. In CGH, a test sample is typically compared to a reference sample to determine the existence of genomic aberrations. Typically, nucleic acids from the test sample are differentially labeled from nucleic acids from the reference sample, and nucleic acids from both samples are typically hybridized to a microarray of probes. Signals are then detected from nucleic acids hybridized to the microarray. Deviations of the log ratio of the signals generated from the labels of the test and reference nucleic acids from an expected value (e.g., zero for diploid regions) are detected and may be used as an indication of copy number differences.
The currently available CGH techniques still have noteworthy limitations. For example, certain genome-wide artifacts commonly known as “GC waves” (which may be due to the guanine/cytosine (GC) content of the probes used in CGH) can cause the log ratio to deviate from its expected value resulting in false positives. GC-waves can add large scale variability to the probe signal ratios and interfere with data analysis algorithms as they can skew signal logarithmic ratio data away from expected values. The GC-wave artifact can increase the potential for false positive aberration calls in specific genomic regions, and can also obscure true aberration calls (See Marioni et al., (2007), Genome Biology, 8:R228).
Fluorescent in-situ hybridization (FISH), realtime PCR, and digital PCR (ddPCR) methods have been used to detect gene copy number changes, however the degree to which one can multiplex (samples, regions of interest, and variant types) is limited using these technologies. Next generation sequencing has the ability to perform a significantly higher degree of multiplexing (sample multiplexing, targeted region multiplexing, and variant type multiplexing all in a single assay) compared to FISH, real time PCR, and digital PCR technologies (See e.g., (Castle et al., (2010) BMC Genomics, 11:244; Wood et al., Nucleic Acids Research, Vol. 38, No. 14, e151; Conway et al., (2012) The Journal of Molecular Diagnostics, Vol. 14, No. 2, p 104-111).
Furthermore, FISH in particular, lacks the ‘fine’ resolution to distinguish closely residing local variations. Sequencing methods such as whole genome sequencing have been used to detect copy number variations (Xi et al., PNAS 108: E1128 (2011); Zong et al., Science (2012) Vol. 338 no. 6114 pp. 1622-1626). However, such methods are time consuming, expensive, and require extensive bioinformatics analysis to determine copy number variations.
Additional methods for detecting CNVs are needed.