Genotyping is an important technique in genetic research for mapping a genome and localizing genes that are linked to inherited characteristics (e.g., genetic diseases). The genotype of a subject generally includes determining alleles for one or more genomic locus based on sequencing data obtained from the subject's DNA. Diploid genomes (e.g., human genomes) may be classified as, for example, homozygous or heterozygous at a genomic locus depending on the number of different alleles they possess for that locus, where heterozygous individuals have two different alleles for a locus and homozygous individuals have two copies of the same allele for the locus. The proper genotyping of samples is crucial when studies are done in the large populations needed to relate genotype to phenotype with high statistical confidence.
In genotyping analysis of diploid genomes by sequencing, the coverage (number of sequencing reads) for a particular genomic locus is used to establish the confidence of an allele call. However, confidence in allele calling is significantly reduced when bias is introduced during sample preparation, e.g., when the starting sample is in limiting amounts and/or when one or more amplification reactions are employed to prepare the sample for sequencing. Thus, in samples having limited amounts of DNA, one may see high coverage (i.e., a high number of sequencing reads) for an allele on one chromosome over the allele on a different chromosome due to amplification bias (e.g., amplification from only a few, or even one, polynucleotide molecule). In this case, coverage alone may be misleading when measuring confidence in an allele call.
The present invention finds use in increasing the confidence in allele calling as well as in other applications based on nucleic acid sequence analysis, especially in the context of studying genotypes in a large population of samples.