1. Field of the Invention
The present invention is generally directed to the detection of short tandem repeat (STR) genetic markers in a genomic system. The present invention is more specifically directed to the simultaneous amplification of the thirteen specific and distinct polymorphic STR genetic loci of the Combined DNA Index System (CODIS) using the polymerase chain reaction (PCR) and the use of locus specific brackets (LSB) in electrophoretic calibration of their fragment lengths to determine in one, two or four PCR reactions and analytical channels the alleles of each locus contained within the multiplex system.
2. Background of the Invention
Due to their highly polymorphic nature, short tandem repeat (STR) loci are extremely useful as genetic markers. For example, the utilization of STRs has been fundamental to the identification and characterization of many disease genes, and to the development of such sophisticated technologies as linkage mapping and DNA genotyping.
STRs are short, tandemly repeated DNA sequences which are interspersed throughout the human genome at up to several hundred thousand loci (Koreth, et al. 1996; Fregeau, et al., 1997). They are also found in animals and plants where they are similarly useful as genetic markers (Orti, et al., 1997; Powell, et al., 1996). STRs are typically 2-7 base pairs in length. These loci are highly polymorphic with respect to the number of repeat units they contain and may vary in internal structure as well. Variation in the number of STR repeat units at a particular locus causes the length of the DNA at that locus to vary from allele to allele and from individual to individual. Thus, many allelic variants exist within the human population, and STRs provide a rich source of genetic markers.
While the alleles at a single STR locus may be the same for two different individuals in a population, especially if the individuals are genetically related, the probability that the alleles of two individuals will be identical at several different loci becomes smaller and smaller as the number of loci which are examined increases. If a sufficient number of loci are examined, the overall allelic pattern will be unique for each individual. As a result, and of particular importance in forensic analysis, by determining the alleles at a sufficiently large number of loci in two different DNA samples it is possible to establish with virtual certainty whether or not the two samples originally came from the same individual.
Characterization of the alleles at specific STR loci for purposes of individual identification usually begins with their PCR amplification from genomic DNA of the individual whose genome contains those loci. Although a particular repeat unit may be common to several different STR loci, identification of a particular STR locus may be effected via PCR amplification by utilizing primer pairs which hybridize to unique DNA sequences which flank the repeat region, i.e. unique sequences located 5xe2x80x2 and 3xe2x80x2 to the repeat units. Use of such unique primers makes it possible to simultaneously amplify many different STR loci in a single DNA sample, a technique referred to as multiplexing. The resulting PCR products (amplicons) from the various loci may then be separated by electrophoresis and identified by determining their lengths in comparison to known DNA standards.
While the process is in theory straightforward, several factors must be considered in order to ensure correct identification of the STR loci. For example, STR alleles are typically categorized by the number of repeat units they contain, which is convenient for entry into databases. In forensic applications, the preferred alleles are for the most part composed of regular repeat units of a size that is optimally resolved with current electrophoretic technology, usually four bases long (Edwards et al., 1991; Perez-Lezaun, et al., 1997). However, some of the tetrameric STR loci useful in forensic analysis contain non-integer alleles which differ in size by only 1 or 2 nucleotides (Puers, Science 272: 1755-1762, 1993). Therefore, an error of  less than 0.5 nucleotide is necessary for accurate sizing of these alleles. This level of resolution has been reliably obtained only with instruments intended for automated DNA sequencing analysis. Such instruments are designed to analyze the length of DNA fragments produced by the Sanger chain termination chemistry employed in DNA sequencing (Connell, et al., 1987). This sequencing chemistry produces a set of DNA fragments each of which terminates with 1 of the 4 dideoxyribonucleotides. The fragments in the set produced differ in length by only one nucleotide and form a xe2x80x9cladderxe2x80x9d of successively longer fragments which must be reliably resolved from one another by electrophoresis.
Electrophoresis instruments used to separate sequencing fragments utilize a slab gel or capillary format, but vary in their method of detection. For example, with the ALF(trademark) and ALFexpress(trademark) slab gel systems (Amersham Pharmacia Biotech, Piscataway, N.J.) all 4 dideoxyribonucleotide-terminating fragment types are labeled with the same fluorophore. The 4 fragments types must therefore be assigned to 4 different lanes for electrophoresis in order to distinguish among them. The ABI Prism 310(copyright) capillary and ABI Prism 377(copyright) slab gel systems (PE Applied Biosystems, Foster City, Calif.) allow electrophoresis of all fragments in the same channel or lane because different fluorophores are assigned to each of the 4 dideoxyribonucleotides. The newer Visible Genetics Microgene Clipper(trademark) (Visible Genetics, Inc., Toronto, Canada) employs two fluorophores to identify two sets of fragments which are electrophoresed in two gel lanes. The Hitachi FMBIO(copyright) II Fluorescent Scanner employs 3 fluorophores. All of these instruments employ computerized measurement of the migration time of each fragment over a fixed distance or time (Hitachi) in order to xe2x80x9ccallxe2x80x9d the nucleotide sequence of the DNA molecule under analysis. Since consecutive fragments differ from each other by only one nucleotide (their sequences being otherwise homologous) their relative mobilities are almost identical. Thus the small differences in length can be measured accurately and allow the alignment of fragments relative to each other when constructing the sequencing xe2x80x9cladderxe2x80x9d which represents the oligonucleotide sequence.
Sequencing gel electrophoretic instruments therefore provide the resolution necessary to discriminate between DNA fragments which differ in length by only 1 nucleotide (Carrano et al., 1989). When these instruments were later adapted to STR analysis a problem arose, because there was no longer an entire series of similarly mobile fragments to be aligned in correct series, but rather only one or two alleles from each amplified locus, depending upon whether the subject was homo- or heterozygous. Now the lengths of these fragments had to be measured by means of calibration standards in order to assign them the correct allele number. The standards employed were no longer almost identical in sequence like the series of Sanger gene termination fragments, but were heterologous, usually produced by restriction enzymatic digestion of microbial DNA. Sequence differences and the ensuing electrophoretic mobility differences (Frank, R. et al., 1979) between the calibration standards and their target DNA caused a large and variable calibration error of up to 3 nt (AMPFISTR(copyright) User""s Manual, 1998). Manufacturers have corrected this error with either a heterologous or chemically compounded internal and external lane standard labeled with its own fluorophore distinct from that of their target alleles combined with an external lane allelic ladder (Schumm, J. W., 1997). Allelic ladders themselves are not co-electrophoresed as internal lane standards because by migrating in the same position as the sample alleles they could interfere with their measurement by obscuring small sample peaks, by spectral interference or by peak broadening.
LSB overcome many of the difficulties in calibration of STR measurement with markers of incompatible electrophoretic mobility. They are made through the deletion or addition of tandem repeat units within the polymorphic regions of STR containing genetic loci to produce bracketing variant alleles just shorter and longer than all alleles or common alleles of their locus. They differ from true alleles of their locus only by containing fewer or more repeat units in their polymorphic regions. Therefore, their electrophoretic mobility is in register with true alleles of their locus of origin even during changed operating conditions because true alleles and LSB are affected in almost the same way due to their comparable length and chemical structure.
However, the resolution of PCR amplicons from multiplexed STR alleles presents additional challenges since typically the alleles of many loci in a sample are most efficiently analyzed together. Therefore, the potential length range of the PCR amplicons from a given STR locus must not overlap the length range of any other locus, the arnplicons of which are to be co-electrophoresed in the same lane. Barring the use of some means of differential labeling, it is not possible to individually distinguish the loci of origin of overlapping alleles. Overlapping amplicons are most commonly identified by labeling them with different fluorophores, each of which specifically labels either a single locus or several loci which do not overlap and can therefore be identified solely by migration time (Sullivan et al., 1992).
The electrophoresis system which is to be utilized must be equipped with a fluorometer capable of separating and detecting photon emissions of different wavelengths from the different fluorophores (Carrano et al., 1989). Several adequate electrophoresis systems are available. For example, Applied Biosystems, Inc. has developed a four fluorophore slab gel system. Three of the fluorophores are used to label each of three overlapping triplexes and the fourth is used to label an internal lane calibration standard (Ziegle et al., 1992). The four-fluorophore detector has been adapted to capillary electrophoresis systems (Demers et al., 1998). In order to correct for deviant mobility compared to the target alleles, the internal lane standard is calibrated against allelic ladders derived from each locus by co-electrophoresis with them in an external lane on the 377 Prism(copyright) slab gel instrument (Ziegel) or by another electrophoresis with them in the same capillary on the 310 Prism(copyright) instrument (Lazaruk et al., 1998). Visible Genetics, Inc. has developed an electrophoresis instrument (VGI Microgene Clipper(trademark)) capable of detecting fluorescence from two fluorophores which can therefore discriminate two STR multiplexes simultaneously in a single gel lane. Hitachi Genetic Systems (South San Francisco, Calif.) has developed a three fluorophore detector (FIMBIO II(trademark)).
Another important constraint on the multiplexing of STR fragments is the limit to the maximum size of PCR amplicons which can be accurately resolved by electrophoresis with a standard size gel or capillary system within a reasonable period of time. Additionally, in forensic samples longer alleles are more subject to degradation (Walsh et al., 1992; Edwards et al., 1994). A desirable fragment size range for human identification testing is therefore up to about 400 bp (Gill et al., 1996). Depending upon the potential size range of the alleles to be amplified, the number of STR amplicons which can be multiplexed in a single gel lane will be limited by their length, even if the amplicons of the selected markers do not overlap (Klimpton et al., 1993; Schumm, U.S. Pat. No. 5,843,660).
The analysis of highly polymorphic STR genetic loci has become the preeminent method of forensic identification (Robertson, 1995). In the United States, the Federal Bureau of Investigation (FBI) has established a national data base called the Combined DNA Index System (CODIS) to allow state and local crime laboratories to store and match forensic DNA test results. These loci have met extensive evaluation and selection criteria including: 1) a high degree of polymorphism ensuring their power to discriminate between individuals, 2) a sufficient number of loci with a large power of exclusion to prevent false matching of crime samples, 3) reproducible amplification by PCR from forensic DNA samples, and 4) alleles accurately discriminated by electrophoresis. Since its inception, the CODIS system is proving its worth by providing crucial evidence leading to the conviction of many criminals.
Current processing of forensic samples for CODIS still requires two PCR reaction vessels and two electrophoretic lanes, be it with a slab gel or capillary device. For example, both Applied Biosystems, Inc. 310 and 377 Prism(copyright) machines can simultaneously resolve the emission spectra of up to four of the currently available fluorophores. But one color must be reserved for the internal lane calibration standard, leaving only three fluorophores to label the thirteen polymorphisms (Ziegle, 1992). The strategy for PCR amplification in this system has therefore been restricted to the development of three multiplex subsets, each labeled with its own fluorophore. With currently known primer pairs, labeling with only three fluorophores has not been enough to accommodate all thirteen CODIS loci in a manner which allows them to be co-amplified in a single reaction vessel producing fragment lengths of  less than 400 bp for analysis in a single electrophoretic channel.
For several reasons it is desirable that all 13 CODIS loci be amplified in a single reaction vessel and analyzed in a single electrophoretic lane. The reasons include: 1) conservation of sample DNA; 2) avoidance of potential mix-ups due to split samples; 3) uniform PCR reaction conditions to maximize the opportunity for uniform, readable electrophoretic peak heights, and uniform extra nucleotide addition by DNA polymerase to increase accuracy of DNA fragment length measurements; 4) uniform electrophoretic conditions without any lane to lane variation; 5) uniform polynucleotide fragment detection conditions to increase accuracy; 6) enhanced sample throughput; 7) more opportunity for automation; and 8) lower cost.
It would be of benefit to have available multiplex configurations which are designed to amplify all CODIS alleles together in one vessel in order to produce a similar abundance of each, with preferred minimum and maximum amplicon lengths of between 100-400 bp or more preferred 100-350 bp. PCR amplicons of  less than 100 bp may be difficult to distinguish from primer/dimer amplification artifacts. The system would have to amplify and detect all known or all common polymorphisms of the CODIS loci. In addition, the multiplex PCR groupings detected by the same label would need to produce non-overlapping amplicons within each subset, and multiplex subsets must be limited to those that can be amplified, differentially labeled and distinguished using technology which is currently available (Miscicka-Sliwka et al, 1997; Oldroyd et al, 1995). Further, it would be of benefit if the method utilized internal lane standards accurate in calibration which were fully compatible with and specific for the loci which were being amplified, and did not require separate labeling (Dau, U.S. Pat. No. 6,013,444, which is herein incorporated by reference).
The invention provides a method of determining the alleles present at a plurality of loci in a DNA-containing sample using multiplex PCR amplification of the loci. PCR amplification is carried out using primer pairs which are specific for each locus. The loci include the thirteen CODIS loci (FGA, vWA, TH01, TPOX, CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11) which have been selected by the Federal Bureau of Investigation as those to be utilized in forensic databases (Schumm et al., 1999) with or without amelogenin for sex determination (Sullivan, 1993). The DNA sequence of loci D3S1358 and D13S317 is described. The loci have been organized into several different xe2x80x9cmultiplex subsetsxe2x80x9d (A1, B1, C1, D1, A2, B2, C2, D2, A3, B3, C3, D3, A4, C4, D4, C5, D5, C6, D6 and C7) such that, within a given multiplex subset, the amplicons produced by PCR amplification of the loci in that subset do not overlap one another in length and are detected by means of the same label. Multiplex subsets are amplified in a single reaction vessel and detected in a single electrophoretic channel. Their amplicons may be assigned fragment lengths based upon calibration by LSB electrophoresed as internal and/or external channel standards. In addition to LSB, multiple markers composed of heterologous or chemically compounded DNA may also be applied as internal and/or external lane standards and alleles from the loci to be measured may also be applied as external lane standards.
The multiplex subsets have been further configured into groups of xe2x80x9ccompound multiplexesxe2x80x9d which contain 2 or more multiplex subsets. The compound multiplexes can be amplified (and concomitantly labeled) in a single PCR reaction vessel, and analyzed in a single channel of, for example, an electrophoresis apparatus. While the amplicons produced by two different multiplex subsets may overlap in length, the overlapping amplicons may be distinguished from one another by differential labeling of the group of primers for each multiplex subset. Compound multiplexes are detected by analytical electrophoresis using multiple calibration for their multiplex subsets. Also provided are kits containing primer pairs and LSB for use in carrying out the methods of the present invention.
The invention further provides a method of determining the lengths of the alleles of a genetic locus by utilizing both internal and external lane calibration standards. Various combination of internal and external lane calibration standards may be utilized, including but not limited to: LSB as internal lane standards, and LSB combined with at least one true allele as external lane standards; MM as internal lane standards, and LSB combined with at least one true allele and MM as external lane standards; MM as internal lane standards, and MM and locus specific alleleic ladders as external lane standards.
The invention further provides a method for the analysis of the data obtained via the use of both internal and external lane calibration standards, and software to carry out the analysis.