Analysis of copy number variants (CNVs) on a genomic scale is useful for assessing cancer progression and identifying congenital genetic abnormalities. CNVs are typically identified by microarray hybridization, but can also be detected by next-generation sequencing (NGS) (Alkan et al., 2009; Sudmant et al., 2010). This is generally done using algorithms that measure the number of sequence reads mapping to specific regions. Consequently, the resolution of sequence-based copy number methods depends largely on the number of independent mappings.
The current trend in next generation sequencing technologies is to increase the number of bases read per unit cost. This is accomplished by increasing the total number of sequence reads per lane of a flow cell, as well as increasing the number of bases within each read. Because the accuracy of copy number determination methods is driven by the quantity of independent reads, increased length of sequence reads does not improve the resolution of copy number analysis. Most of the genome is mapped well by short reads, on the order of 25-30 base pairs (bp). At the moment, high throughput sequencers are generating read lengths of ˜150 bp, well in excess of what would suffice for unique mapping.