DNA identification has become one of the most important application tools in forensic science since DNA typing methodologies were introduced around 1985 and may be one of the most important discoveries in the field since the introduction of fingerprinting. With its extremely high capability to differentiate one individual from another (2), it has become widely used in courts around the country and worldwide.
In recent years, as DNA typing technology has improved and a national DNA database has become available, the popularity and effectiveness of DNA typing methodologies has increased. DNA analysts and other law enforcement agencies have been trying to understand the hidden information contained in DNA through an understanding of the molecular biology, genetics, and statistics involved to provide justice in trials. Because the probability that one person has the same genotype at a set of prescribed DNA loci as another person is very small, DNA typing is widely used in forensic identification, especially when unidentified criminals leave testable evidence such as semen, blood, and saliva at crime scenes. These important stains provide the extracted DNA samples that are to be used for criminal identification.
DNA typing for forensic applications is based on applying statistical tools to fundamental principles of diagnosis and gene characteristic analysis (2). The DNA profile obtained from criminal evidence has a unique identity, and the characteristics of the DNA profile are analyzed using these methods. The objective of DNA typing is to identify the genotype of the individual who left the evidence. After the perpetrator's genotype is obtained from DNA analysis, forensic caseworkers can compare the genotype of the criminal with that of a suspect or can search for a matching DNA profile in the local, state, and national CODIS databases for a possible suspect (2). Therefore, as an early step in investigation, DNA typing results of forensic samples should be obtained.
In many cases, especially in rape cases, when a DNA sample is extracted from a biological stain containing body fluids or tissues from more than one person, the result is often a mixed DNA profile. This kind of DNA profile is essentially composed of one contributor's DNA sample superimposed on that of another (3). Much of the DNA evidence obtained from crime scenes is a mixture of more than one contributor's DNA. Generally, the genotype of the victim is known, but the genotype of the perpetrator cannot be obtained clearly and directly due to the presence of DNA of another person in the sample. The genotype of each contributor to the DNA mixture must be deciphered first before further investigation.
Until now, the deconvolution of mixed DNA profiles contributed by multiple people has been one of the most challenging tasks facing forensic scientists. Part of the difficulty derives from the large number of possible genotype combinations that can be exhibited by the multiple contributors (4) in the mixed DNA profile. So far, no analytical and reliable method has been published for the resolution of DNA mixture into its components.
Early methods to resolve the genotype profile of contributors in a sample used loci with four alleles to estimate the mass ratio between the two contributors (5). For a locus with four detected alleles, each contributor has to have two different alleles with no shared allele between the two contributors. Therefore, only one allele assignment structure is possible (two heterozygotes). For loci with only two or three alleles more than one possible allele assignment structure is possible at each locus. To determine the genotype profile of an individual at two- or three-allele loci, an initial-guess mass ratio derived from the four-allele loci was used to estimate and evaluate all the possible allele assignment combinations that could be made by the contributors to the sample. The mass ratio at the two- and three-allele loci that best fit the observed relative allele peak areas was identified as the contributor's genotype profiles. This procedure was labor-intensive, and yielded a conservative resolution result.
More recently, in 1998, the British group of P. Gill et al. of the Forensic Science Services (5) presented a novel method to resolve DNA mixtures using quantitative allele peak data. This method requires an iterative search for the optimum mass ratio to fit the allele peaks at each locus that an individual can contribute to a sample. For each mass ratio used to fit each possible genotype profile, the residuals between the expected allele peak areas and those obtained from the measured allele peaks are calculated. The smallest residual at each locus is added to the minimum residuals similarly derived from allele peak data available at other loci. The genotype combinations that give the overall lowest minimum residual are selected to be the best-fit genotype combinations for the loci. This method is limiting and artificial because a finite set of prior-determined mass ratios is used to calculate the fitting residual. Further, this method is labor intensive because iterations are involved in searching for the best-fit genotype combinations.
In 2001, Mark Perlin and Beata Szababy developed the Linear Mixture Analysis (LMA) method to resolve DNA mixtures using quantitative allele peak data (18). In this method, all the quantitative allele peak data of all loci in a sample are integrated into a single matrix computation (18). This method imposes the same mass ratio to all loci analyzed in the mixture. This is in contrast to the observation that the best-fit mass ratio may vary from locus to locus in a sample, due to unequal DNA amplification and other nonidealities (24). It is predicted that the imposition of the same weight fractions to fit all loci will present a limitation on that set of weight fractions being optimal for all loci.
There is a need in the art for an efficient and accurate method to resolve a sample mixture of DNA into the genotype of each individual whose DNA is contained within the mixture.