The process of human identification through DNA analysis is a common objective of forensics investigations. As used herein, “forensics” is the study of evidence, for example, that discovered at a crime or accident scene that is then used in a court of law. “Forensic science” is any science used to answer questions of interest to the legal system, in particular the criminal or civil justice system, providing impartial scientific evidence for use in the courts of law, for example, in criminal investigations and trials. Forensic science is a multidisciplinary subject, drawing principally from chemistry and biology, but also from physics, geology, psychology and social science, for example. The goal of one aspect of human forensics, forensic DNA typing, is to determine the identity or genotype of DNA acquired from a forensic sample, for example, evidence from a crime scene or DNA sample from an individual. Typical sources of such DNA evidence include hair, bones, teeth, and body fluids such as saliva, semen, and blood. There often exists a need for rapid identification of a large number of humans, human remains and/or biological samples. Such remains or samples may be associated with war-related casualties, aircraft crashes, and acts of terrorism, for example.
Tandem DNA repeat regions, which are prevalent in the human genome and exhibit a high degree of variability among individuals, are used in a number of fields, including human forensics and identity testing, genetic mapping, and linkage analysis. Various types of DNA repeat regions exist within eukaryotic genomes and can be classified based on length of their core repeat regions. Short tandem repeats (STRs), also called simple sequence repeats (SSRs), or microsatellites are repeat regions having core units of between 2-6 nucleotides in length. For a particular STR locus, individuals in a population differ in the number of these core repeat units.
STR typing involves the amplification of multiple STR DNA loci that display a collection of alleles in the human population that differ in repeat number. Typically, the products of such amplification reactions are analyzed by polyacrylamide gel or capillary electrophoresis using fluorescent detection methods, and subsequent discrimination among different alleles based on amplification product length. Because a typical STR typing analysis will use multiple STR loci that are not genetically linked, the product rule can be applied to estimate the probability of a random match to any STR profile where population allele frequencies have been characterized for each loci (Holt C L, et. al. (2000) Forensic Sci. Int. 112(2-3): 91-109; Holland M M, et. al. (2003) Croat. Med. J. 44(3): 264-72), leading to extremely high differentiation power with low random match probabilities within the human population. Because of the short length of STR repeats and the high degree of variability in number of repeats among individuals in a population, STR typing has become a standard in human forensics where sufficient nuclear DNA is available.
A number of tetranucleotide STRs and methods for STR-typing have been explored for application in human forensics. Commercial STR-typing kits are available that target different STR loci, including a common set of loci. The FBI Laboratory has established 13 nationally recognized core STR loci that are included in a national forensic DNA database known as the Combined DNA Index System (CODIS). The 13 CODIS core loci are CSF1PO, FGA, TH01, TPOX, VWA, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, and D21S11. Sequence information for these loci are available from STRBase. The range of numbers of repeat units for reported alleles for these CODIS13 loci is 6-16, 15-51.2, 3-14, 6-13, 10-24, 9-20, 7-16, 6-15, 8-19, 5-15, 5-15, 7-27, and 24-38 respectively (Butler, J M, 2001 Forensic DNA Typing Academic Press). When profiles are available with allele information for all 13 of these core STR loci, the average probability of a random match is lower than one in a trillion among non-related individuals. STR-typing by DNA sequencing is less desirable as it presents time constraints and is labor intensive.
STR-typing using STR markers has become the human forensic “gold standard” as the combined information derived from the 13 distinct CODIS alleles provide enough information to uniquely identify an individual's DNA signature to a statistical significance of 1 in 10.sup.9. Standard or conventional STR-typing methods, which typically use amplification and electrophoretic size determination to resolve individual alleles, have certain limitations. At low STR copy number it is not uncommon to observe allele “drop out” in which a heterozygous individual is typed as a homozygote because one of the alleles is not detected. Additionally, in cases of highly degraded or low copy DNA samples, entire markers may drop out leaving only a few STRs from which to derive a DNA profile. In certain situations for example, such as mass disaster victim identification, a large number of samples with varying DNA quantity and quality can exist, many of which produce only partial STR profiles. While in some cases a partial profile can be used to include or exclude a potential suspect or identity, conventional STR typing methods sometimes do not provide sufficient resolution at the available loci in the case of a partial profile. Thus, there is a need within the forensics community to increase resolution of STR-typing methods, such that it is possible to derive additional information from degraded DNA samples which yield an incomplete set of STR markers, and from other samples where detection of the complete STR set is not possible.
Techniques would be beneficial that could resolve sequence polymorphisms in alleles and thus increase the observed allelic variation for several common STR loci, while maintaining the advantages of amplification-based techniques, such as rapidness and the ability to automate the procedure for high-throughput typing. Thus, there is a need for STR typing methods that provide a higher level of resolution compared with standard techniques. Moreover, there exists a need for the development of an automated platform capable of high-throughput sample processing to enable analysis of a large number of samples produced simultaneously or over a short period of time, as in the case of mass disaster or war.
Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a process that can be easily automated. Electrospray ionization mass spectrometry (ESI-MS) provides a platform capable of automated sample processing, and can resolve sequence polymorphisms between STR alleles (Ecker et. al. (2006) JALA. 11:341-51).
Matrix-assisted laser desorption-ionization time-of-flight mass spectrometry (MALDI TOF MS) has been employed to analyze STR, SNP, and Y-chromosome markers. (Butler, J.; Becker, C. H. Science and Technology Research Report to NIJ 2001, NCJ 188292, October; Monforte, J. A.; Becker, C. H. Nat Med 1997, 3, 360-362; Taranenko, N. I.; Golovlev, V. V.; Allman, S. L.; Taranenko, N. V.; Chen, C. H.; Hong, J.; Chang, L. Y. Rapid Commun Mass Spectrom 1998, 12, 413-418; Butler, J. M.; Li, J.; Shaler, T. A.; Monforte, J. A.; Becker, C. H. Int J Legal Med 1999, 112, 45-49; Ross, P. L.; Belgrader, P. Anal Chem 1997, 69, 3966-3972). To obtain routinely the necessary mass accuracy and resolution using MALDI TOF MS, the amplicon size must be less than 100 bp, which often requires strategies such as enzymatic digestion and nested linear amplification. In the MALDI approach, PCR amplicons must be thoroughly desalted and co-crystallized with a suitable matrix prior to mass spectrometric analysis. The size reduction schemes and clean-up schemes employed for STR and SNP analyses in the cited reports resulted in the mass spectrometric analysis of only one strand of the PCR amplicon. By measuring the mass of only one strand of the amplicon, an unambiguous base composition may be difficult to determine and only the length of the allele may be obtained. Even with the size reduction schemes, mass measurement errors of 12 to 60 Daltons (Da) are observed for products in the size range 15000 to 25000 Da. This corresponds to mass measurement errors of the 800 to 2400 ppm. Because of poor mass accuracy and mass resolution typical of MALDI, multiplexing of STRs is difficult and not routine, although in one published report three STR loci were successfully multiplexed. The issue of allelic balance has not been addressed for MALDI-TOF-MS based assays.
U.S. Pat. Nos. 6,764,822 and 6,090,558 relate to methods for STR-typing using mass spectrometry (MS). Use of electrospray ionization (ESI)-MS to resolve STR alleles has been reported (Hannis and Muddiman, 2001, Rapid Commun. Mass. Spectrom. 15(5): 348-50; Hannis et. al, 2000, Advances in Nucleic Acid and Protein Analysis, Manipulation and Sequencing, 3926: 1017-2661). ESI-MS provides a platform capable of automated sample processing and analysis that can resolve sequence polymorphisms (Ecker et. al. (2006) JALA. 11:341-51).
Several groups have described detection of PCR products using high resolution electrospray ionization-Fourier transform-ion cyclotron resonance mass spectrometry (ESI-FT-ICR MS). Accurate measurement of exact mass combined with knowledge of the number of at least one nucleotide allowed calculation of the total base composition for PCR duplex products of approximately 100 base pairs. (Aaserud et al., J. Am. Soc. Mass Spec., 1996, 7, 1266-1269; Muddiman et al., Anal. Chem., 1997, 69, 1543-1549; Wunschel et al., Anal. Chem., 1998, 70, 1203-1207; Muddiman et al., Rev. Anal. Chem., 1998, 17, 1-68). Electrospray ionization-Fourier transform-ion cyclotron resistance (ESI-FT-ICR) MS may be used to determine the mass of double-stranded, 500 base-pair PCR products via the average molecular mass (Hurst et al., Rapid Commun. Mass Spec. 1996, 10, 377-382).
There is an unmet need for methods and compositions for analysis of DNA forensic markers that approach the level of resolution sequencing affords, that is capable of scanning a substantial amount of the variation contained within an amplified fragment, yet that is also rapid, amenable to automation, and provides relevant information without the burden of extensive manual data interpretation. Preferably, such a method would not require a priori knowledge of the potentially informative sites within a sample to carry out an analysis. Preferably, such methods would be able to provide substantial resolving capability for forensic analyses in cases of degraded DNA or with relatively low amounts of DNA, for example, by allowing resolution of sequence polymorphisms that may allow discrimination of equal or same-length alleles based on small differences in sequence or base composition.