Great interest in single cell heterogeneity has led to recent endeavors toward single cell genome sequencing with whole-genome amplification and robustness (Navin et al., 2011; Fan et al., 2011; Lao et al., 2008; Hou et al., 2012; Cheng et al., 2011; Telenius et al., 1992; Zhang et al., 2006; Zhang et al., 1992). However, the methods used to date are generally hampered by relatively low coverage. Polymerase chain reaction (PCR) has been a gold standard for DNA amplification of specific regions. Relying on exponential amplification with random primers, PCR-based whole-genome amplification methods introduce strong sequence dependent bias, and hence are not ideal for uniform representation of the whole genome. Multiple Displacement Amplification (MDA) has been developed to overcome these shortcomings of PCR (Dean et al., 2002; Dean et al., 2001), but MDA still exhibits considerable bias. For these reasons, whole-genome sequencing of single human cells, which allows the accurate detection of single nucleotide variants (SNVs), has not been convincingly reported.
To achieve whole-genome SNV calling for a single cell with the accuracy that is comparable to the bulk sequencing, the main technological barrier is the amplification errors produced and propagated in nonlinear amplification in the current state of the art. In nonlinear amplification, the errors made by the polymerase will be copied when the newly synthesized product is used as a template in the following cycles. For regular PCR amplification where there are thousands or more templates to begin with, these errors will not cause any problem because each random error in a particular copy is diluted by the large number of other independent copies. However, for single cell amplification the scenario is different, as one only has a single copy of each unique chromosome as the template. In nonlinear amplification, the errors made in the first cycle will be possessed by half of DNA products and these errors will continue to be copied at similar percentage. Eventually in the sequencing data, these errors cannot be discriminated from true heterozygous variants in the single cell. More importantly, this false positive rate cannot be reduced simply by increasing sequencing depth. To overcome this technical problem, linear amplification is needed. When the amplification is linear, all the DNA products are copied directly from the original template. As a result, the amplification errors are independently generated and can be diluted among the linearly amplified products.