Field of the Invention
The present invention relates generally to human identification and bio-ancestry testing, and, more particularly, to improvements that enhance the sensitivity of detection during analysis of human DNA samples for human identity testing or for bio-ancestry studies.
Description of Related Art
Short tandem repeat (STR) loci are the primary genetic markers used in human identity testing. These markers are highly polymorphic and afford a high degree of sensitivity of detection such that relatively low quantities (1 ng-250 pg) of template DNA can be analyzed (Andersen, J. F., et al., Further validation of a multiplex STR system for use in routine forensic identity testing, Forensic Science International, 78(1): 47-64 (1996); Brinkmann, B., et al., Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat, The American Journal of Human Genetics, 62(6): 1408-1415 (1998); Collins, P. J., et al., Developmental validation of a single-tube Amplification of the 13 CODIS STR Loci, D2S1338, D19S433, and amelogenin: The AmpFSTR® Identifiler® PCR Amplification Kit, Journal of Forensic Sciences, 49(6): 1265-1277 (2004); LaFountain, M. J., et al., TWGDAM Validation of the AmpFeSTR Profiler Plus and AmpFeSTR COfiler STR Multiplex Systems Using Capillary Electrophoresis, Journal of Forensic Sciences, 46(5): 1191-1198 (2001); Micka, K. A., et al., Validation of multiplex polymorphic STR amplification sets developed for personal identification applications, Journal of Forensic Sciences, 41: 582-590 (1996); Moretti, T., et al., Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples, Journal of Forensic Sciences, 46(3): 647 (2001)).
Retrotransposable elements (REs), including long interspersed nuclear elements (LINEs), short interspersed nuclear elements (SINEs) and SVA elements, are another group of markers that can be useful for human identity testing. SINEs are a class of REs that are typically less than 500 nucleotides long; while LINEs are typically greater than 500 nucleotides long (A. F. A. Smit, The origin of interspersed repeats in the human genome, Current Opinion in Genetics Development, 6(6): 743-748 (1996); Batzer, M. A., et al., Alu repeats and human genomic diversity, Nature Reviews Genetics, 3(5): 370-379 (2002); Batzer, M. A., et al., African origin of human-specific polymorphic Alu insertions, Proceedings of the National Academy of Sciences, 91(25): 12288 (1994); Feng, Q., et al., Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition, Cell, 87(5): 905-916 (1996); Houck, C. M., et al., A ubiquitous family of repeated DNA sequences in the human genome, Journal of Molecular Biology, 132(3): 289-306 (1979); Kazazian, H. H., et al., The impact of L1 retrotransposons on the human genome, Nature Genetics, 19(1): 19-24 (1998); Ostertag, E. M., et al., Biology of mammalian L1 retrotransposons, Annual Review of Genetics, 35(1): 501-538 (2001)). LINE full-length elements are ˜6 kb in length, contain an internal promoter for polymerase and two open reading frames (ORFs) and end in a polyA-tail. SINEs include Alu elements, primate specific SINEs that have reached a copy number in excess of one million in the human genome. SINEs were originally defined by their interspersed nature and length (75-500 bp), but have been further characterized by their RNA polymerase III transcription. The third type of RE is the composite retrotransposon known as an SVA (SINE/VNTR/Alu) element (Wang, H., et al., SVA Elements: A Hominid-specific Retroposon Family, J. Mol. Biol. 354: 994-1007 (2005)). SVAs are composite elements named after their main components, SINE, a variable number of tandem repeats (VNTR), and Alu. As a consequence of the VNTR region, full-length SVA elements can vary greatly in size. These markers have potential application to identity testing, kinship analyses, and evolutionary studies (see Smit; Batzer, et al. (2002); Batzer, et al. (1994); Feng, et al.; Houck, et al.; Kazazian et al.; and Ostertag, et al., references, cited supra). Insertion and null allele (INNUL) markers may include SINEs, LINEs and SVAs.
The structure of REs is described in FIG. 1. The Alu family of interspersed repeats is the most successful of the mobile genetic elements within primate genomes, having amplified to a copy number of greater than 500,000 per haploid genome. Alu elements mobilize via an RNA polymerase III-derived intermediate in a process defined as retroposition. Alu repeats are approximately 300 bp in length and are ancestrally derived from the 7SL RNA gene. Each Alu element is dimeric in structure and is flanked by short intact direct repeats. These direct repeat sequences are formed when an Alu element inserts within staggered nicks in the genome. In addition, each Alu element has an oligo dA-rich region in the middle and at the 3′ end (FIG. 1). The amplification of Alu repeats to such large copy numbers has occurred over a period of 65 million years and the process is still active in the present day genome (A. F. A. Smit, The origin of interspersed repeats in the human genome, Current Opinion in Genetics Development, 6(6): 743-748 (1996); Zangenberg, et al., cited supra; Budowle, B., SNP typing strategies, Forensic Science International, 146: S139 (2004)).
Alu sequences within the human genome can be divided into subfamilies of related members based upon the presence of diagnostic mutations shared in common by subfamily members. These subfamilies are of different evolutionary ages with the younger ones (Ya5, Ya8 and Yb8) being primarily restricted to the human genome (Houck, C. M., et al., A ubiquitous family of repeated DNA sequences in the human genome, Journal of Molecular Biology, 132(3): 289-306 (1979); Kazazian, H. H., et al., The impact of L1 retrotransposons on the human genome, Nature Genetics, 19(1): 19-24 (1998)). These subfamilies arose in a hierarchical manner over evolutionary time with the younger subfamily members retaining the diagnostic mutations of the older subfamily that preceded it.
The Ya5/8 and the Yb8 subfamilies are independent derivatives of the Y subfamily of Alu repeats. The young subfamilies are present in relatively small copy numbers within the genome compared to the bulk of the Alu repeats, which primarily belong to the PS and AS subfamilies. For instance, the Y subfamily is comprised of approximately 100,000 members; Ya5 subfamily, 1000 members; Ya8 subfamily, 50 members and the Yb8 subfamily, approximately 1000 members (Moretti, T., et al., Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples, Journal of Forensic Sciences, 46(3): 647 (2001)).
The youngest subfamilies of Alu elements, Ya5, Ya8 and Yb8 first arose in the primate genomes approximately 5 million years ago (Batzer, M. A., et al., African origin of human-specific polymorphic Alu insertions, Proceedings of the National Academy of Sciences, 91(25): 12288 (1994); Feng, Q., et al., Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition, Cell, 87(5): 905-916 (1996)). Amplification of Alu elements within humans is still an ongoing process. As human population groups migrated and colonized different parts of the world, all new Alu insertions in individuals belonging to the newer populations were absent in the original population, and vice versa. In other words, several elements that belong to the young subfamilies are dimorphic for their presence/absence within different human population groups (Syvanen, A. C., et al., Identification of individuals by analysis of biallelic DNA markers, using PCR and solid-phase minisequencing, American Journal of Human Genetics, 52(1): 46-59 (1993); LaRue, B. L., et al., A validation study of the Qiagen Investigator DIPplex® kit; an INDEL-based assay for human identification, International Journal of Legal Medicine, 2012, 1-8).
Realizing the potential of these dimorphic Alu elements as genetic markers, investigators have identified the dimorphic Alu repeats from a larger background of fixed Alu elements. Using the Alu insertion PCR assay described in FIG. 2, each Alu element was tested against a panel of several human genomic DNA samples as templates for the levels of polymorphism. Each and every dimorphic Alu repeat has been thoroughly characterized for its respective allele frequency in as many as 50 different worldwide population groups (Syvanen, A. C., et al., Identification of individuals by analysis of biallelic DNA markers, using PCR and solid-phase minisequencing, American Journal of Human Genetics, 52(1): 46-59 (1993); LaRue, B. L., et al., referenced supra; Shriver, M. D., et al., Ethnic-affiliation estimation by use of population-specific DNA markers, American Journal of Human Genetics, 60(4): 957 (1997)).
Ustyugova, S. V., et al. (Cell line fingerprinting using retroelement insertion polymorphism. BioTechniques, 38(4): 561-565 (2005)), demonstrated that REs could be used for cell line identification. Novick, et al. (Polymorphic human specific Alu insertions as markers for human identification. Electrophoresis, 16(1): 1596-1601 (1995)), and Mamedov, et al. (A new set of markers for human identification based on 32 polymorphic Alu insertions, European Journal of Human Genetics, 18(7): 808-814 (2010)), recently described a set of Alu's (a type of SINE) for paternity testing. Both of these studies intimated that the systems could be applied to forensic analyses. The REs have low mutation rates which makes them appealing for kinship analyses compared with the less stable STRs. In addition, they do not yield stutter artifacts, due to slippage during the PCR, which can reduce some interpretation issues associated with STRs in forensic mixture profiles (Andersen, J. F., et al., Further validation of a multiplex STR system for use in routine forensic identity testing, Forensic Science International, 78(1): 47-64 (1996); Brinkmann, B., et al., Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat, The American Journal of Human Genetics, 62(6): 1408-1415 (1998); Moretti, T., et al., Validation of short tandem repeats (STRs) for forensic usage: performance testing of fluorescent multiplex STR systems and analysis of authentic and simulated forensic samples, Journal of Forensic Sciences, 46(3): 647 (2001)).
Forensic samples often are compromised in quality and quantity. Degraded samples may contain fragments of DNA that are less than 250 bp in length, and the quantities may be limited to subnanogram levels of recoverable DNA (Burger, J., et al., DNA preservation: A microsatellite DNA study on ancient skeletal remains, Electrophoresis, 20(8): 1722-1728 (1999); Fondevila, M., et al., Challenging DNA: assessment of a range of genotyping approaches for highly degraded forensic samples, Forensic Science International: Genetics Supplement Series, 1(1): 26-28 (2008); Golenberg, E. M., et al., Effect of Highly Fragmented DNA on PCR, Nucleic Acids Research, 24(24): 5026-5033 (1996); R. Hughes-Stamm, S., et al., Assessment of DNA degradation and the genotyping success of highly degraded samples, International Journal of Legal Medicine, 125(3): 341-348 (2011)). REs can range in size from hundreds (SINEs) to several thousand (LINEs) by in length (see Smit; Batzer, et al. (2002); Batzer, et al. (1994); Feng, et al.; Houck, et al.; Kazazian et al.; and Ostertag, et al., references, cited supra). Previous attempts to use Alu sequences for identity testing capitalized on the size difference between insertion and null alleles by amplifying the entire region with the same forward and reverse primers (Novick, G. E., et al., Polymorphic human specific Alu insertions as markers for human identification, Electrophoresis, 16(1): 1596-1601 (1995)). The insertion allele would be 200-400 bp larger than the null allele, and could be detected electrophoretically based on size differences. While useful for paternity testing and some population studies where DNA quality is not compromised, the large size difference between amplicons of the null and insertion alleles will impact amplification efficiency during the PCR and is a limitation for forensic samples. The limitation is differential amplification favoring the smaller amplicon (i.e., the null allele) and possibly dropping out of the insertion element, which is exacerbated if the sample is highly degraded.
The use of SINEs such as Alu repeats in determining human identity has been studied and reported (see Mamedov, et al., and Novick, et al., cited supra). Until now, however, due to the inherent size difference associated with INNULs, the use of REs has not been useful in a practical sense. Although REs make up over 40% of the human genome (Lander, E. S., et al., Initial sequencing and analysis of the human genome, Nature, 409(6822): 860-921 (2001)) and present myriad potential targets for human identity testing, these INNULS (i.e., insertion and null alleles, instead of INDELs because one of the allele forms is not the result of a deletion) have received limited attention for use in forensic human identity testing (Zangenberg, et al., Multiplex PCR: Optimization Guidelines, in PCR Applications: Protocols for Functional Genomics, Academic Press, San Diego, Calif., 1999, p. 73-94).
Advantageously, a convenient way to design synthetic primers for PCR amplification of relatively short, repeating sequences, known as the mini-primer design, has been previously described in U.S. Pat. No. 7,794,983 B2, to Sinha, et al., which is hereby incorporated by reference. Using the mini-primer design, interspersed genetic elements containing characteristic direct repeat sequences (direct repeats) may be amplified and quantitated.
The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention, and, therefore, it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.