It has been well documented that whole genome amplifications, either by PCR-based or by isothermal amplification, generally leads to biased amplification, resulting in that some areas are over-amplified, while other areas are under-amplified. This bias makes copy number variation (CNV) difficult to determine and single nucleotide polymorphisms (SNPs) or single nucleotide variations (SNVs; i.e., mutations) challenging to identify or “call.”
Various computer programs have been employed to help to solve these problems. However, due to the chaotic nature of amplification, it is often difficult to determine whether an observed CNV is genuine or an artifact of amplification. In addition, most computer programs operate on the assumption that each genome consists of 44 euchromosomes and two sex chromosomes, which is not the case for all cells, and certainly not the case for cancer cells, which can exhibit vast differences in copy number among cells within a cancer cell line and within a cancer tissue.
In particular, karyotyping studies have found that some normal mammalian cells are of high ploidy. A single cell could contain 4, 6, 8, or up to hundreds' of full sets of chromosomes. These cells have uniformly changed copy numbers throughout the genome. Therefore, the absolute copy number cannot be determined for these cells using conventional sequencing methods or PCR, as these methods all rely on at least one reference point on the chromosome, be it a gene, or a segment, or a whole chromosome.
Karyotypes of tumor cells in tumor tissues or in established tumor cell lines tends to be even more complex and heterogeneous. Chromosome numbers tend to vary from less than 46 (hypoploid) to 92 (tetraploid). Adding to the complexity, tumor cells within an established tumor cell line, or within a tumor tissue, consist of a collection of cells of different chromosome numbers, and a particular chromosome, e.g. Chromosome 1, could exist as 1 copy, or 2, or 5, or 6, or 7 in one cell, but be missing in another cell. Therefore, for a tumor cell line like this, the average copy number of Chromosome 1 may be a fraction. Mutation “calls” for situations like this are extremely challenging, as one mutation on one of seven Chromosome 1's will be represented by 14% of reads.
Moreover, in the detection of rare mutations in cancer studies, even with the help of “deep sequencing.” the typical error rate of ˜1% in sequencing usually results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications, but become extremely problematic in the identification of ultra-rare mutations in populations of cells, as well as in single cells, if the rare mutation occurred in a minor allele.