1. Field of the Invention
The present invention relates to methods of storing, aligning, and retrieving haplotype data, and more particularly, to methods of storing haplotype data by ordering and aligning them during storing the haplotype data in a database, and retrieving them thereafter.
2. Description of the Related Art
Haplotype data as a deoxyribonucleic acid (DNA) sequence are a set of single nucleotide polymorphism (SNP) alleles existing along chromosome regions. The SNP indicates DNA base variations specifying individualities, and about one of thousand bases appears in the human genome. For example, if human chromosomes contain 30% of adenines (A) and 70% of guanines (G) at the corresponding SNP, the A and G are called variants or alleles of the corresponding SNP.
For example, assume that haplotype data of three people have the following base array, respectively:
ATAGTCACGTACGTATTACG;(SEQ ID NO.:1) ATCGTCACGAACGTATGACG;(SEQ ID NO.:2)and ATCGTCACGAACGTATGACG,(SEQ ID NO.:3)
where C denotes cytosine, and T denotes thymine.
In this case, the SNP set corresponds to a third, tenth, and seventeenth position. Thus, the alleles in the third, tenth, and seventeenth positions are A/C, T/A, and T/G, respectively.
Conventionally, such haplotype data are stored in a list type data structure. However, in order to examine if there is a certain haplotype data or to extract related information corresponding to the certain haplotype data if the certain haplotype data is determined to exist, the list data structure requires O(n) time for the search, where n is the number of haplotype data in the database. Therefore, it is necessary to provide methods of storing and aligning the haplotype data by which the search time can be reduced.