Nucleic acid fragments with certain nucleotide sequences have been used for identification purposes. This methodology can be applied to all biological material containing nucleic acids with sequences sufficiently distinguishing the source of the material from a genetically distinct source.
In the past, the fragments were made visible with autoradiography or chemoluminiscence on a chromatographic gel and determining whether a fragment pattern was identical or not to another pattern required visual comparison.
Presently, the detection of nucleic acid fragments is generally performed by using a set of markers for a number of particular sequences in nucleic acids (called “loci”) and by using several dyes simultaneously. This produces a pattern of colored bands, which can be converted into a pattern of peaks using an optical read-out system.
The markers are so designed that they only detect a limited number of specific nucleotide sequences or so called “alleles” in the loci. On the basis of so called “ladders” an allele-number can be attributed to the detected nucleotide sequences, or its size can be determined.
Depending on the set of markers used for the determination of a pattern of nucleotide sequences from nucleic acid fragments tables summarizing the determined data can be constructed.
For such tables, the data produced by automated equipment for the detection of nucleotide patterns from nucleic acid fragments derived from a sample of biological material are translated into so called “allele numbers”, depicting the number of times a certain allele is repeated in the detected nucleic acid fragment, the so called Short Tandem Repeats or STR's. Such a table, in which the allele numbers are noted in pairs (one for each parent the allele is inherited from), is called a DNA profile, or genetic fingerprint.
Considering the number of possible alleles to be detected per marker, simple mathematics make clear that the more markers are used the less likely a certain pattern of alleles that is found in the nucleic acids in biological material of a specific source will also be found in the nucleic acids in biological material from a genetically distinct source.
For forensic purposes, various marker sets are used, either containing 10 markers, plus the marker AMEL for gender, in the SGM-Plus set, or 13+1 markers in the CODIS set used by the FBI. Even more discriminatory power can be obtained when the 15+1 marker sets called PowerPlex 16 or Identifiler are used.
More recently, methods were published for forensic identification of individuals which are based on panels of SNPs (Single Nucleotide Polymorphisms).
Besides nucleic acid based systems for determining a unique distinguishing biological characteristic of an individual or an organism, other systems have been described to determine other characteristics with such qualities, which are based on other biochemical parameters, like the concentration values of a set specific antibodies or HLA markers.
In forensics, a general format for storing and exchanging DNA profiles is XML. The size of an XML file with the information of such a profile is about 3,000 to 4,000 bytes.
Large databanks with DNA profiles have been set up all over the world, for forensic and other purposes. They have been filled with millions of such profiles and that number is rising quickly. Therefore, the burden on digital systems for the storage, the search and comparison as well as the sending and receiving of a multitude of such data is growing rapidly and might outgrow the capacity of the existing digital systems in the near future. The data files containing SNP-panels are even considerably larger than the forensic XML files.
Another disadvantage of a DNA profile or a SNP panel being depicted as a tables, or an XML file with allele numbers or SNP characters, is that such a multitude of symbols cannot be used to form a single linear barcode or 2 dimensional barcode or matrix code that can be scanned with a hand held scanner.
Yet another disadvantage of DNA profiles using allele numbers of STR's is that the total number of characters is variable since the number of repeats is depicted with 1 to 8 characters per allele.
Considering the above, it is an object, amongst other objects, of the present invention to at least partially, or even completely, solve one or more of the above problems associated with storage and exchange of biological characteristics while maintaining the distinguishing, i.e., characterizing and identifying, capabilities thereof.