DNA typing is commonly used to identify the parentage of human children and to confirm the linage of horses, dogs and other animals, and agricultural crops. DNA typing is also commonly employed to identify the source of blood, saliva, semen, and other tissue found at a crime scene or other sites requiring identification of human remains. DNA typing is also employed in clinical settings, for example the therapy of certain leukaemia, to determine success or failure of bone marrow transplantation in presence of particular cancerous tissues. DNA typing involves the analysis of alleles of genomic DNA with characteristics of interest, commonly referred to as “markers”. Most typing methods in use today are specifically designed to detect and analyse differences in the length and/or sequence of one or more regions of DNA markers known to appear in at least two different forms in the population. Such length and/or sequence variations are referred to as “polymorphism”. Any region, i.e. “locus” of DNA in which such variation occurs is referred to as “polymorphic locus”.
When speaking about polymorphic DNA sequences one must distinguish in particular between the so called repeated polymorphic DNA sequences and non-repetitive polymorphic elements. In the study of DNA sequences one can distinguish two main types of repeated sequences; tandem repeats and interspersed repeats. Tandem repeats include satellite DNA. Satellite DNA consist of highly repetitive DNA and is so called because repetitions of short DNA sequence tend to produce different frequency of the nucleotides adenine, cytosine, guanine and thymine and thus, have different density from bulk DNA—such that they form a second or a “satellite” band in genomic DNA separated on a density gradient. A mini-satellite is a section of DNA that consists of short series of bases 10 to 100 bp. These occur in more than 1,000 locations in the human genome. Some mini-satellites contain a central (or core) sequence of letters “GGGCAGGANG” (where N can be any nucleotide) (N=UPAC code for A, C, G, T) or more generally a strand bias. It has been shown that the sequence per se encourage chromosomes to swap DNA. In alternative models, it is the presence of a neighbouring, cis-acting myotic double-strand break hotspot which is the primary cause of minisatellite repeat copy number variations.
Indispersed repetitive DNA is found in all eukaryotic genomes. These sequences propagate themselves by RNA mediated transposition and they have been called retroposons. Such retroposons are substantially larger than the repetitive elements discussed above.
So-called short indispersed nuclear elements (SINEs) are a further class of repetitive DNA elements. A particular type of SINEs are the so-called ALU-sequences. These are about 300 base pairs in length. Therefore, also these elements are not particularly useful for a simple and straight forward profiling assay.
Microsatellites are simple sequence repeats (SSRs) or also short tandem repeats (STRs) or polymorphic loci present in nuclear DNA and organelle DNA that consist of repeating units of 1 to 6 base pairs in length. They are used as molecular markers which have wide-ranging applications in the field of genetics including kinship and population studies. Microsatellites can also be used to study gene dosage (looking for duplications or deletions of a particular genetic region). One common example of microsatellite is (CA)n repeat, where n is variable between alleles. These markers are often present in high levels of inter- and intraspecific polymorphism. Particularly when tandem repeats number 10 or greater appear. The repeated sequences often simple, consisting of two, three or four nucleotides (di-, tri-, tetranucleotide repeats respectively) and can be repeated 10 to 100 times. CA nucleotide repeats are very frequent in human and other genomes, and are present every few thousand base pairs. As there are often extremely many alleles present at an STR locus, genotypes within pedigrees are often fully informative and the progenitor of a particular allele can often been identified. However, making use of these so-called STRs in genetic assays has the fundamental effect that due to the large variation within the population one may find in extremely high amount of alleles and said alleles when analyzing these on, e.g. in electrophoretic gel system will differ substantially in size.
When using, e.g. RFLP, the theoretical risk of a coincidental match is 1 in 100 billion (100,000,000,000). However, the rate of laboratory error is almost certainly higher than this, and often actual laboratory procedures do not reflect the theory under which the coincidence probabilities were computed. For example, the coincidence probabilities may be calculated based on the probabilities that markers in two samples have bands in precisely the same location, but a laboratory worker may conclude that similar—but not precisely identical—band patterns result from identical genetic samples with some imperfection in the agarose gel. However, in this case, the laboratory worker increases the coincidence risk by expanding the criteria for declaring a match. STRs have the same problem. This is due to the fact that many alleles exist for each locus and this complexity may lead to ambiguous amplification products which are than incorrectly assigned.
Systems containing several loci are called multiplex systems and many such systems containing up to more than 11 separate STR loci have been developed and are commercially available. Although, amplification protocols with STR loci can be designed which produce small products generally from 60 to 500 base pairs (bp) in length and alleles from each locus are often contained within range of less than 100 base pairs. The substantial drawback with using STR loci is that due to the high variability within the population with respect to particular loci certain alleles may have a high number of repeats and thus, result in large amplification products. Design of these systems is limited, in part, by the difficulty in separating multiple loci in a single gel or capillary. This occurs, because there is spatial compression of fragments of different sizes, especially longer fragments in gels or capillaries, i.e. commonly used means for separation of DNA fragments by those skilled in the art. Although, the analysis of multi-allelic short tandem repeats (STRs) still has the largest impact on forensic genetics and case work, it must be said, that the systems are limited especially for DNA evidences of low quality and quantity. For example degraded DNA samples represent one of the major challenges of the major STR analysis as amplicon sizes within multiplex assays often exceed 200 base pairs. Degraded samples are extremely difficult to amplify.
At the same time focus has been put on so-called single nucleotide polymorphisms (SNPs), however, these SNPs are difficult to analyze because a given system must be able to identify a single nucleotide polymorphic position.
The present invention represents a significant improvement over existing technology, bringing increased power of discrimination, position and throughput the DNA profiling for linkage analysis, criminal justice, paternity testing and other forensic or medical and genetic identification applications. This is in particular due to the fact that the present invention makes use of a combination of a different type of polymorphic markers, i.e. the so-called frequent biallelic deletion-insertion polymorphisms (DIP).