An organism's genes, in the form of deoxyribonucleic acid (DNA), encode all of that organism's proteins and functional RNA. The organism's genes provide the information that is needed to build and maintain its cells and to reproduce. Each cell of the organism includes a full copy of the organism's genome, and each time a cell replicates itself, it creates a copy of the whole genome.
But copying DNA is sometimes imperfect, and imperfect copying of DNA can be a source of genetic mutation. Errors introduced in copying are one source of mutation. And while mutation makes evolution possible, many mutations are known also to be associated with disease or an increased risk of it.
One type of error in duplication is omission of a segment of DNA. Another type of error is making multiple copies of the same segment of DNA. Both kinds of errors can be referred to as copy number variation (CNV).
Associations can be found between CNV and certain conditions, including, e.g., health or medical conditions, including some diseases. But finding an association between a specific CNV and a specific condition requires first finding CNV in the genomes of people who have the condition. Once the association is established, moreover, it can be used for diagnostic purposes, but that, too, requires finding the CNV in the genome of a person who is suspected of having the condition.
Some procedures are known for finding CNV in individuals. For example, in fluorescence in situ hybridization (FISH), a fluorescent probe includes and oligonucleotide that is designed to complement only one or more specific parts of a chromosome. Duplications or deletions could be found by identifying regions of greater or lesser fluorescence respectively. But one drawback of FISH is its low resolution, being unsuitable for detecting regions shorter than 5-10 Mb. FISH is described in, e.g., Kallioniemi et al., Comparative Genomic Hybridization: A Rapid New Method for Detecting and Mapping DNA Amplification In Tumors, 1993 SEMINARS IN CANCER BIO. 4(1):41-46.
Another technique for detecting CNV is array-Comparative Genomic Hybridization (aCGH). In this assay, DNA from a reference sample is labeled with a fluorophore, and DNA from a patient sample is labeled with a different fluorophore. The labeled samples are then used as probes that are cohybridized competitively onto nucleic acid targets. aCGH also suffers from limited resolution, being unable to detect CNV of regions smaller than about 40 kb.
Other approaches to measuring dosage may involve DNA sequencing, but directly sequencing from raw DNA is not practical. Sequencing large numbers of short segments of DNA (e.g., 150-200 bases) in large numbers is practical but introduces new difficulties, such as the bias and/or sampling error resulting from DNA amplification and the problem of using short subsequences to detect abnormal copying or deletion of potentially long sequences.
It would therefore be advantageous to develop techniques for detecting CNV that could be based on sequencing short segments of DNA while overcoming the obstacles inherent in using this kind of sequencing.