1. Field of the Invention
The present invention relates to the fields of genetic analysis, DNA sequencing, and further is generally directed to the detection of short tandem repeat (STR) genetic markers in a genomic system.
2. Related Art
Presented below is background information on certain aspects of the present invention as they may relate to technical features referred to in the detailed description, but not necessarily described in detail. That is, individual parts or methods used in the present invention may be described in greater detail in the materials discussed below, which materials may provide further guidance to those skilled in the art for making or using certain aspects of the present invention as claimed. The discussion below should not be construed as an admission as to the relevance of the information to any claims herein or the prior art effect of the material described.
Measurement of the length of DNA fragments plays a pivotal role in genetic mapping, disease diagnostics, paternity testing and human identification. Traditionally, capillary electrophoresis is used for DNA length measurement and genotyping of short tandem repeats. This requires labeled primers and allelic ladders as standards to avoid typical run-to-run or instrument-to-instrument variations. However, the limiting factor with this sequencing approach in heterozygous sample analysis is unsynchronized polymerization in alleles with different lengths that can lead to imbalanced heterozygote peak height ratios.
DNA sequencing has revolutionized the field of bioscience and its use has been critical for a number of important medical discoveries. Sequencing technologies have evolved over the years. One of the first, the Sanger method, has provided an elegant sequencing method widely used for the last three decades. A more recently developed technology, pyrosequencing, is an alternative method to Sanger's for sequencing of DNA fragments with a number of advantages. Pyrosequencing is based on sequencing by synthesis method that monitors the polymerase activity coupled to 2-enzymes to generate a detectable light response; i) ATP-Sulfurylase converts inorganic pyrophosphate (PPi) to ATP generated by polymerase during nucleotide incorporation, and ii) luciferase that uses ATP as a source of energy to generate light.
The present invention is described as using pyrosequencing, but other suitable sequencing-by-synthesis platforms can be used. Commercial sequencing by synthesis platforms are available, such as the Genome Sequencer from Roche/454 Life Sciences, the Genome Analyzer from Illumina/Solexa, the SOLiD system from Applied BioSystems, Pacific Biosystem and the Heliscope system from Helicos Biosciences. In some embodiments, the sequencing platforms used in the methods of the present invention have one or more of the following features: 1) four differently optically labeled nucleotides are utilized (e.g., Genome Analyzer); 2) sequencing-by-ligation is utilized (e.g., SOLiD); 3) pyrosequencing is utilized (e.g., Roche/454); and 4) four identically optically labeled nucleotides are utilized (e.g., Helicos).
Since STRs can comprise in the range of from 2-7 bases repeated 5-30 times, it is preferred to use a sequencing method that has a read length that can extend up to about 150 bases. For analyzing sequences longer than the read length of the sequencing by synthesis method used, computer-aided methods exist for assembling overlapping fragments of sequence; one simply is required to carry deeper sequencing.
Sequencing technologies have become even more important recently with interests of many groups to conduct whole genome sequencing both for comparisons of different species, as well as different humans and has become the foundation of personalized medicine. One of the critical issues in DNA sequencing of whole genomes is the problem of relatively short lengths of DNA reads using most approaches, including Sanger's. Short read lengths complicate the informatics needed to analyze and place those sequences in the context of the whole genome. Currently, DNA length determinations of short tandem repeats (STRs) for forensic DNA analysis and detection of mutated genes for clinical applications are performed using polymerase chain reaction (PCR) fragment size measurements. This approach is based on an electrophoretic technique. It involves use of dye labeled primers, requires very careful data analysis and is limited by the occurrence of technology artifacts.
Pyrosequencing technology has also been used in a broad range of applications such as genotyping of microbes, SNP genotyping, mutation detection and gene identifications. This technique has the potential to provide a robust method for DNA length measurement which is time and cost competitive compared to other approaches. It is automatable and provides easily interpretable results. In pyrosequencing, there is no need for specific dye labeling of PCR products. The technique is also helpful and has been shown to be capable of determining the sequence variants within or near repeat regions in short tandem repeats (STRs) in addition to fragment length differences. This is particularly useful in avoiding confusion during interpretation of results for human identity testing or relationship testing.
STRs are short, tandemly repeated DNA sequences which are interspersed throughout the human genome at up to several hundred thousand loci. They are also found in animals and plants where they are similarly useful as genetic markers. STRs are typically 2-7 base pairs in length repeated 5-30 times. These loci are highly polymorphic with respect to the number of repeat units they contain and may vary in internal structure as well. Variation in the number of STR repeat units at a particular locus causes the length of the DNA at that locus to vary from allele to allele and from individual to individual. Thus, many allelic variants exist within the human population, and STRs provide a rich source of genetic markers.
Characterization of the alleles at specific STR loci for purposes of individual identification usually begins with their PCR amplification from genomic DNA of the individual whose genome contains those loci. Although a particular repeat unit may be common to several different STR loci, identification of a particular STR locus may be effected via PCR amplification by utilizing primer pairs which hybridize to unique DNA sequences which flank the repeat region, i.e., unique sequences located 5′ and 3′ to the repeat units. Use of such unique primers makes it possible to simultaneously amplify many different STR loci in a single DNA sample, a technique referred to as multiplexing. The resulting PCR products (amplicons) from the various loci may then be separated by electrophoresis and identified by determining their lengths in comparison to known DNA standards.
Common repeats used for typing and linkage analysis are “CA” “GATA” or “ACTT” sequences. That is, an STR may contain a given number of repeats of sequences containing CA, ACTT, or the like. Exemplified below are repeats of AGAT found in two human STRs. The analysis of individual humans or other organisms is based on how many repeats they have. They may be homozygous, or heterozygous, i.e., have one number of repeats on one chromosome in a pair and another number on the other chromosome in the pair, as explained further below.
Specifically designed as amplification-based detection methods, STR and microsatellite-based DNA typing offer some practical advantages over typing methods based on larger repeat sequences. For example, PCR amplification using primers targeted to a specific STR sequence typically generates 50-to-500-bp-sized fragments without compromising allelic diversity. This allows for easier sizing of a wider range of alleles on a single electrophoretic separation, as compared with larger tandem repeat sequences that typically produce an order-of-magnitude greater range in fragment size diversity
Specific Patents and Publications
U.S. Pat. No. 6,531,282 to Dau, et al., issued Mar. 11, 2003, entitled “Multiplex amplification and analysis of selected STR loci,” discloses means to identify the alleles present in a DNA-containing sample by providing subsets of loci for amplification by multiplex PCR. The loci include the thirteen CODIS short tandem repeat (STR) loci and amelogenin. The loci within each subset are grouped so that, upon PCR amplification, the amplicons produced within a given subset do not overlap.
U.S. 2005/0112569 A1 by Chung et al., entitled “Method of determining blood-relationship by typing STR alleles on the X chromosome and DNA typing kit using the SA,” published May 26, 2005, discloses STR alleles including GATA172D05.