DNA
There are five animal kingdoms, each animal kingdom may include phylum (for example—Chordata, Artropoda and Mollisca), each phylum may include classes (such as Mammalia, Ayes, Reptilia and Amphibia, each class may include orders (the class Mammalia may include Primares, Carnivora, Rodentia and Perissodactyla), each order includes families (the order Carnivora may include Canidae, Felidae, Ursidae, Hyaenidae and Mustelidae), each family includes genus and each genus includes species.
Each member of each species carries genetic information. The Deoxyribonucleic acid (DNA) molecule is the molecule that encodes the genetic information.
DNA is made of four nucleotides—adenine (A), cytosine (C), guanine (G) and thymine (T).
A DNA molecule can be represented by a succession of letters (each letter out of A, C, G and T may represent a nucleotides basis while letters such as N may represent ambiguity). These letters indicate the order of nucleotides within a DNA molecule.
The extraction of the nucleotides from a living tissue is known as sequencing and is very complex. A complete genomic sequence of a member of a species (for example—one person) can be found by sequencing that person's DNA one or more times.
Scientists usually process, retrieve or review DNA fragments of the complete genomic sequence. A DNA fragment may be represented by a DNA fragment string that include letters A,C,G,T and optionally ambiguity representative letters such as N.
A sequence of letters that represents the complete genomic sequence of the person can be about 2 terabyte long.
The 1000 genomes project
The 1000 genomes project (www.1000.genomes.org) sequences the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. The goal of the 1000 genomes project is to find most genetic variants that have frequencies of at least 1% in the populations studied. This goal can be attained by sequencing many persons.
The 1000 genomes project maintains a vast genetic information database that exceeds 200 Terabytes. Due to its vast size, scientists are able to retrieve only to very small portions of that vast genetic information database, and do not benefit from the wealth of genetic data included in the vast genetic information database.
The 1000 genomes project also releases genetic information database updates that are hard to manage.
There is a growing need to provide a system and method for allowing access to vast genetic information databases and simplify the updates of the vast genetic information database.