Much progress has been made in the development of high-throughput DNA sequencing technology in recent years (Pettersson E, Lundeberg J, Ahmadian A (February 2009). “Generations of sequencing technologies”. Genomics 93 (2): 105-11. doi:10.1016/j.ygeno.2008.10.003. PMID 18992322; Staden, R (1979 Jun. 11). “A strategy of DNA sequencing employing computer programs”. Nucleic Acids Research 6 (7): 2601-10. doi:10.1093/nar/6.7.2601. PMID 461197; Church G M (January 2006). “Genomes for all”. Sci. Am. 294 (1): 46-54. doi:10.1038/scientificamerican0106-46. PMID 16468433). However, a comprehensive analysis of the entire genome is not currently commercially available or technologically possible. To date, whole genome sequencing is used only for research purposes (completegenomics.com/services/standard-sequencing/; illumina.com/services.ilmn), and a medically useful whole-genome-sequencing scale service simply does not exist.
While there are some reports of whole-genome-medical sequencing services, such services utilize information from the whole genome for only a few disease-associated single nucleotide polymorphisms (SNPs) in a limited number of genes (illumina.com/services.ilmn). This is in part because, although ancestral-specific mutations useful for medical applications of whole-genome sequencing have been generated in a variety of diseases (ncbi.nlm.nih.gov/omim), and Genome Wide Association Studies (GWAS) (Klein R J, Zeiss C, Chew E Y, Tsai J Y, Sackler R S, Haynes C, Henning A K, SanGiovanni J P, Mane S M, Mayne S T, Bracken M B, Ferris F L, Ott J, Barnstable C, Hoh J (April 2005). “Complement Factor H Polymorphism in Age-Related Macular Degeneration”. Science 308 (5720): 385-9. doi:10.1126/science.1109557. PMC 1512523. PMID 15761122) has generated a partial list of ancestral SNPs for research purposes, a comprehensive list of whole genome-wide ancestral SNPs has not been generated to date. Without a comprehensive list of SNPs, the development of whole genome sequencing as a medical diagnostic tool may not be possible
Progress in the area of whole genome sequencing as an approved diagnostic tool has been impeded largely because medical sequencing methods developed to date generate a large number of false positives and false negatives base calls inherent to the technology (Zhao J, Grant SF (February 2011). “Advances in Whole Genome Sequencing Technology”. Curr Pharm Biotechnol 23(2) 293-305. PMID 21050163). There is an additional layer of misinformation generated in whole genome sequencing due to the current NIH-derived reference genome used as the standard template for sequencing (Scherer, Stewart (2008). A short guide to the human genome. CSHL Press. p. 135. ISBN 0-87969-791-1; Wheeler D A, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen Y J, Makhijani V, Roth G T, Gomes X, Tartaro K, Niazi F, Turcotte C L, Irzyk G P, Lupski J R, Chinault C, Song X Z, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny D M, Margulies M, Weinstock G M, Gibbs R A, Rothberg J M. (2008). “The complete genome of an individual by massively parallel DNA sequencing”. Nature 452 (7189): 872-6; Bibcode 2008Natur.452 . . . 872W. doi:10.1038/nature06884. PMID 18421352). In particular, all of existing sequencing technologies utilize the same standard reference genome for the bioinformatic reconstruction/assembly of the whole genome from the small DNA fragments and sequenced during the process of obtaining a medically usable completed whole genome. The current standard reference genome, which was generated some years ago by the National Institutes of Health (NIH) as a model for genomic structure and sequence assembly, is based on a single whole genome sequence generated from the composite DNA obtained originally from five different individuals (Editorial (October 2010). “E pluribus unum”. Nature Methods 331 (5): 331. doi:10.1038/nmeth0510-331). As such, it is neither statistically significant nor accurate when comparing individuals from different ancestral backgrounds and may not provide a statistically significant reference for interpreting genomic information.
Although some sequencing companies claim to have a very high accuracy rate for determining a whole genome sequence (Quail, Michael; Smith, Miriam E; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong (1 Jan. 2012). “A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers”. BMC Genomics 13 (1): 341. doi:10.1186/1471-2164-13-341; Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie (1 Jan. 2012). “Comparison of Next-Generation Sequencing Systems”. Journal of Biomedicine and Biotechnology 2012: 1-11. doi:10.1155/2012/251364), the reality is, due to the large size of the genome (˜3.2 billion base pairs coding for 20,000 to 25,000 distinct genes), even a small percentage of errors results in a large number of bases that are incorrectly called. A very low error rate is required for predictive medicine applications (Bentley D R (December 2006). “Whole-genome re-sequencing”. Curr. Opin. Genet. Dev. 16 (6): 545-552. doi:10.1016/j.gde.2006.10.009. PMID 17055251; Genetest.org). Recently, bioinformatic tools have been developed that correct genomic sequence based on familial sequence information for an individual family (familygenomics.systemsbiology.net/publications). Including familial information from three closely related individuals can improve DNA sequence accuracy by 70%. Using information from four or more family members increases accuracy by 90% (Roach J C, Glussman G, Smit A F, Huff C D, Drmanac R, Jorde L B, Hood L, Galas D J (10 Apr. 2010) “Analysis of Genetic Inheritance in a Family Quartet by Whole Genome Sequencing”. Science 328: 636-9 doi:10.3410/f.2707961.2371060). However, such correction tools are time-consuming and add inefficiency and cost to the process of whole genome sequencing.
Accordingly, there is a need for the development of an ancestral-specific reference genome database that incorporates familial genome sequencing information to improve the accuracy of such ancestral-specific reference genomes. An ancestral-specific reference databases can, in turn, be used as tool, for example, for the diagnosis of a patent at risk for a genetic disease or disorder or for the prognosis of such a genetic disease or disorder.