Microsatellites, otherwise called STRs, have multiple alleles that are defined by variation in the number of motif unit repeats. Given their multi-allelic characteristics, they have greater heterozygosity than single nucleotide polymorphisms (SNPs). STR polymorphisms are the result of motif insertions or deletions (indels), arising from slippage errors during DNA replication or recombination events. The diversity of microsatellite alleles is attributable to STR mutation rates (10-2 events per generation) that are significantly higher than the mutation rate for SNPs which are reported to be 10-8 events per generation. Due to their multi-allelic characteristics, STR genotyping has proven useful for the genetic characterization of individual, subpopulations and populations. Moreover, genotyping with approximately 20 STRs can identify an individual with high confidence, enabling its universal application for genetic identification in forensics.
STR genotyping relies on multiplexed PCR amplification of microsatellite loci followed by analysis based on size discrimination with capillary electrophoresis (CE). Forensic genetics employs the CE-based method for nearly all cases of genetic identification. However, this approach has many limitations. First, CE genotyping assays are restricted to thirty STR amplicons or less because of the inherent challenges of multiplexing PCR reactions. Second, CE has low analytical throughput, typically in the tens of markers. Third, PCR amplification of microsatellites introduces indel artifacts, also known as “stutter”, that can obscure true genotypes, particularly when alleles are close in size. Finally, current STR genotyping methods have difficulty resolving alleles in DNA mixtures that are composed of multiple individual genomes. In forensic genetic analysis, it is nearly impossible to distinguish a specific individual DNA sample amongst multiple contributors, particularly when a specific component exists at a low ratio.
Next generation sequencing (NGS) assays have been developed for the analysis of STRs. These include whole genome sequencing (WGS), targeted sequencing using bait-hybridization capture oligonucleotides and multiplexed amplicon sequencing methods that include molecular inversion probes. Regardless of the approach, current NGS methods for STR analysis have significant limitations. STRs' repetitive motifs complicate traditional alignment methods and lead to mapping errors. Sequence reads that span an entire STR locus are the most informative for accurate genotyping. However, many NGS approaches produce reads that truncate the STR sequence, resulting in ambiguous genotypes.
STR genotypes can be determined from WGS data. However, the read coverage of an intact STR locus varies greatly with the standard WGS coverage (e.g. 30× to 60×) and reduces the reads with intact microsatellites. Lower coverage translates into decreased sensitivity and specificity for detecting microsatellite genotypes. Consequently, accurate STR genotyping requires much higher sequencing coverage than is practical with WGS, particularly in cases of genetic mixtures composed of different genomic DNA samples in varying ratios.
Targeted sequencing can improve STR coverage but current methods have limitations. For example, targeting STRs with bait-hybridization enrichment requires randomly fragmented genomic DNA—this reduces the fraction of informative reads containing a complete microsatellite to less than 5%. Furthermore, enrichment for STR loci is complicated by repetitive sequences with potential off-target hybridization. Sequencing library amplification or PCR-dependent multiplexed amplicons lead to significant increase in stutter errors.