Characterization of nucleic acid variants is a problem of great importance in various fields of molecular biology such as, for example, genotyping and identification of strains of bacteria and viruses which are subject to evolutionary pressures via mechanisms including mutation, natural selection, genetic drift and recombination. Nucleic acid heterogeneity is a common feature of RNA viruses, for example. Populations of RNA viruses often exhibit high levels of heterogeneity due to mutations which enhance the ability of the viruses to adapt to growth conditions. Mixed populations of RNA virus quasispecies are known to exist in viral vaccines. It would be advantageous to have a method for monitoring the heterogeneity of viral vaccines. Likewise, new strains of bacterial species are also known to evolve rapidly.
Characterization and quantitiation of newly-evolving bacteria and viruses such as the SARS coronavirus, for example, is typically the first step in containment of an epidemic or infectious disease outbreak. In addition to characterization of naturally occurring variants of bacteria and viruses, there is a need for characterization of genetically engineered bacterial or viral bio-weapons in forensic or bio-warfare investigations. Unfortunately, the process of sequencing entire bacterial or viral genomes or vaccine vector sequences is time consuming and is not effective at resolving mixtures of nucleic acid variants.
Mitochondrial DNA is found in eukaryotes and differs from nuclear DNA in its location, its sequence, its quantity in the cell, and its mode of inheritance. The nucleus of the human cell contains two sets of 23 chromosomes, one paternal set and one maternal set. However, cells may contain hundreds to thousands of mitochondria, each of which may contain several copies of mitochondrial DNA. Nuclear DNA has many more bases than mitochondrial DNA, but mitochondrial DNA is present in many more copies than nuclear DNA. This characteristic of mitochondrial DNA is useful in situations where the amount of DNA in a sample is very limited. Typical sources of DNA recovered from crime scenes include hair, bones, teeth, and body fluids such as saliva, semen, and blood.
In humans, mitochondrial DNA is inherited strictly from the mother (Case J. T. and Wallace, D. C., Somatic Cell Genetics, 1981, 7, 103-108; Giles, R. E. et al. Proc. Natl. Acad. Sci. 1980, 77, 6715-6719; Hutchison, C. A. et al. Nature, 1974, 251, 536-538). Thus, the mitochondrial DNA sequences obtained from maternally related individuals, such as a brother and a sister or a mother and a daughter, will exactly match each other in the absence of a mutation. This characteristic of mitochondrial DNA is advantageous in missing persons cases as reference mitochondrial DNA samples can be supplied by any maternal relative of the missing individual (Ginther, C. et al. Nature Genetics, 1992, 2, 135-138; Holland, M. M. et al. Journal of Forensic Sciences, 1993, 38, 542-553; Stoneking, M. et al. American Journal of Human Genetics, 1991, 48, 370-382).
The human mitochondrial DNA genome is approximately 16,569 bases in length and has two general regions: the coding region and the control region. The coding region is responsible for the production of various biological molecules involved in the process of energy production in the cell and includes about 37 genes (22 transfer RNAs, 2 ribosomal RNAs, and 13 peptides), with very little intergenic sequence and no introns. The control region is responsible for regulation of the mitochondrial DNA molecule. Two regions of mitochondrial DNA within the control region have been found to be highly polymorphic, or variable, within the human population (Greenberg, B. D. et al. Gene, 1983, 21, 33-49). These two regions are termed “hypervariable Region I” (HV1), which has an approximate length of 342 base pairs (bp), and “hypervariable Region II” (HV2), which has an approximate length of 268 bp. Forensic mitochondrial DNA examinations are performed using these two hypervariable regions because of the high degree of variability found among individuals.
There exists a need for rapid identification of humans wherein human remains and/or biological samples are analyzed. Such remains or samples may be associated with war-related casualties, aircraft crashes, and acts of terrorism, for example. Analysis of mitochondrial DNA enables a rule-in/rule-out identification process for persons for whom DNA profiles from a maternal relative are available. Human identification by analysis of mitochondrial DNA can also be applied to human remains and/or biological samples obtained from crime scenes.
The process of human identification is a common objective of forensics investigations. As used herein, “forensics” is the study of evidence discovered at a crime or accident scene and used in a court of law. “Forensic science” is any science used for the purposes of the law, in particular the criminal justice system, and therefore provides impartial scientific evidence for use in the courts of law, and in a criminal investigation and trial. Forensic science is a multidisciplinary subject, drawing principally from chemistry and biology, but also from physics, geology, psychology and social science, for example.
Forensic scientists generally use the two hypervariable regions of human mitochondrial DNA for analysis. These hypervariable regions, or portions thereof, provide only one non-limiting example of a region of mitochondrial DNA useful for identification analysis.
A typical mitochondrial DNA analysis begins when total genomic and mitochondrial DNA is extracted from biological material, such as a tooth, blood sample, or hair. The polymerase chain reaction (PCR) is then used to amplify, or create many copies of, the two hypervariable portions of the non-coding region of the mitochondrial DNA molecule, using flanking primers. When adequate amounts of PCR product are amplified to provide all the necessary information about the two hypervariable regions, sequencing reactions are performed. Where possible, the sequences of both hypervariable regions are determined on both strands of the double-stranded DNA molecule, with sufficient redundancy to confirm the nucleotide substitutions that characterize that particular sample. The entire process is then repeated with a known sample, such as blood or saliva collected from a known individual. The sequences from both samples are compared to determine if they match. Finally, in the event of an inclusion or match, The Scientific Working Group on DNA Analysis Methods (SWGDAM) mitochondrial DNA database, which is maintained by the FBI, is searched for the mitochondrial sequence that has been observed for the samples. The analysts can then report the number of observations of this type based on the nucleotide positions that have been read. A written report can be provided to the submitting agency. This process is described in more detail in M. M. Holland and T. J. Parsons 1999, Forensic Science Review, volume 11, pages 25-51.
Approximately 610 bp of mitochondrial DNA are currently sequenced in forensic mitochondrial DNA analysis. Recording and comparing mitochondrial DNA sequences would be difficult and potentially confusing if all of the bases were listed. Thus, mitochondrial DNA sequence information is recorded by listing only the differences with respect to a reference DNA sequence. By convention, human mitochondrial DNA sequences are described using the first complete published mitochondrial DNA sequence as a reference (Anderson, S. et al., Nature, 1981, 290, 457-465). This sequence is commonly referred to as the Anderson sequence. It is also called the Cambridge reference sequence or the Oxford sequence. Each base pair in this sequence is assigned a number. Deviations from this reference sequence are recorded as the number of the position demonstrating a difference and a letter designation of the different base. For example, a transition from A to G at position 263 would be recorded as 263 G. If deletions or insertions of bases are present in the mitochondrial DNA, these differences are denoted as well.
In the United States, there are seven laboratories currently conducting forensic mitochondrial DNA examinations: the FBI Laboratory; Laboratory Corporation of America (LabCorp) in Research Triangle Park, N.C.; Mitotyping Technologies in State College, Pennsylvania; the Bode Technology Group (BTG) in Springfield, Va.; the Armed Forces DNA Identification Laboratory (AFDIL) in Rockville, Md.; BioSynthesis, Inc. in Lewisville, Tex.; and Reliagene in New Orleans, La.
Mitochondrial DNA analyses have been admitted in criminal proceedings from these laboratories in the following states as of April 1999: Alabama, Arkansas, Florida, Indiana, Illinois, Maryland, Michigan, New Mexico, North Carolina, Pennsylvania, South Carolina, Tennessee, Texas, and Washington. Mitochondrial DNA has also been admitted and used in criminal trials in Australia, the United Kingdom, and several other European countries.
Since 1996, the number of individuals performing mitochondrial DNA analysis at the FBI Laboratory has grown from 4 to 12, with more personnel expected in the near future. Over 150 mitochondrial DNA cases have been completed by the FBI Laboratory as of March 1999, and dozens more await analysis. Forensic courses are being taught by the FBI Laboratory personnel and other groups to educate forensic scientists in the procedures and interpretation of mitochondrial DNA sequencing. More and more individuals are learning about the value of mitochondrial DNA sequencing for obtaining useful information from evidentiary samples that are small, degraded, or both. Mitochondrial DNA sequencing is becoming known not only as an exclusionary tool but also as a complementary technique for use with other human identification procedures. Mitochondrial DNA analysis will continue to be a powerful tool for law enforcement officials in the years to come as other applications are developed, validated, and applied to forensic evidence.
Presently, the forensic analysis of mitochondrial DNA is rigorous and labor-intensive. Currently, only 1-2 cases per month per analyst can be performed. Several molecular biological techniques are combined to obtain a mitochondrial DNA sequence from a sample. The steps of the mitochondrial DNA analysis process include primary visual analysis, sample preparation, DNA extraction, polymerase chain reaction (PCR) amplification, post-amplification quantification of the DNA, automated DNA sequencing, and data analysis. Another complicating factor in the forensic analysis of mitochondrial DNA is the occurrence of heteroplasmy wherein the pool of mitochondrial DNAs in a given cell is heterogeneous due to mutations in individual mitochondrial DNAs. There are different forms of heteroplasmy found in mitochondrial DNA. For example, sequence heteroplasmy (also known as point heteroplasmy) is the occurrence of more than one base at a particular position or positions in the mitochondrial DNA sequence. Length heteroplasmy is the occurrence of more than one length of a stretch of the same base in a mitochondrial DNA sequence as a result of insertion of nucleotide residues.
Heteroplasmy is a problem for forensic investigators since a sample from a crime scene can differ from a sample from a suspect by one base pair and this difference may be interpreted as sufficient evidence to eliminate that individual as the suspect. Hair samples from a single individual can contain heteroplasmic mutations at vastly different concentrations and even the root and shaft of a single hair can differ. The detection methods currently available to molecular biologists cannot detect low levels of heteroplasmy. Furthermore, if present, length heteroplasmy will adversely affect sequencing runs by resulting in an out-of-frame sequence that cannot be interpreted.
Mass spectrometry provides detailed information about the molecules being analyzed, including high mass accuracy. It is also a process that can be easily automated.
There is a need for a mitochondrial DNA forensic analysis which is both specific and rapid, and in which no nucleic acid sequencing is required. There is also a need for a method of rapid characterization and quantitation of nucleic acids which have variant positions relative to a reference sequence. These needs, as well as others, are addressed herein below.