1. Field of the Invention
The present invention relates in general terms to DNA genotypic data that is linked to clinical diagnosis, phenotypic data. More particularly a method is presented for extracting genomic classifiers of disease risk from genomic data as obtained from micro array or gene chip assays in conjunction with their phenotypic correlates. This leads to methods of disease forecasting and individual patient disease risk prediction; as well as to devices which accomplish these goals.
2. Description of the Prior Art
There are almost three billion (coding and non-coding) DNA base pair in the human genome, with about 99.5% of these are shared by all homo sapiens. Each somatic cell contains a maternal and a paternal contribution; so the overwhelming contribution is homozygous, but the remaining pairs appear as two alleles. These noteworthy deviant markers are termed the S(ingle) N(ucleotide) P(olymorphism)—SNPs, which are heterozygous pairs or alleles. By definition for the allele pair to be a SNP the rarer allele frequency must be greater than 1% in the population. A SNP for which both alleles produce the same polypeptide sequence is said to be a silent (synonymous) mutation. If a different polypeptide occurs it is said to be a replacement polymorphism. There is a general view that this subset of the genome accounts for human variations, and in particular carries the potential for acquiring diseases. Replacement polymorphisms, which result in polypeptide substitution, are thought to be responsible for over half the known diseases of mutagenic origin (Stenson et al., 2009).
DNA genotyping is performed by microarrays, also referred to as a gene chips. This is a collection of microscopic DNA spots attached to a solid surface; referred to as reporters or probes. A single chip can contain many hundreds of thousands of probes. While polymerase chain reaction (PCR) microarrays, or gene chips, have facilitated acquisition of vast quantities of genomic data, disappointment has been expressed on the lack of DNA variant linkages to human diseases particularly in the case of complex disorders (Chakravarti, 2011).
Patents exist that associate SNPs with genetic-based diseases. For example, in U.S. Patent Publication No. US2004/0132015 a process is disclosed for detecting mutations in regions determined by codon scanning algorithm. A process for preparing the DNA chip is disclosed using the process, as is the method for detecting mutations using DNA chips. Mutations can be discerned as various genetic diseases and this can be detected and identified, the DNA chip using the codon scanning algorithm can be applied for diagnoses of genetic mutations-associated diseases.
A computer algorithm for mathematical allele combination from a gene type device is disclosed in U.S. Pat. No. 7,272,506. The patent discloses an automated method for identifying allele values from a data file and analyzing polymorphisms DNA. The method is used for distinguishing targeted polymorphisms DNA sites without control samples.
U.S. Patent Publication No. US 2011/0014607 discloses methods for identifying imprinted genes. In some of the methods a first data set is provided of a plurality of nucleic acid sequences corresponding to a plurality of genes known to be imprinted in a subject. A second data set includes a plurality of nucleic acid sequences corresponding to genes known not to be imprinted into a subject. One or more features are identified that, by themselves or combinations, are differentially present or absent from the first data set as contained with the second data set. One or more features are applied to a test data set comprising a plurality of genomic DNA sequences that correspond to one or more genes for which an imprinting status is unknown to identify and imprint gene in a subject. The '607 Publication also discloses a method for identifying a feature in the subject with respect to an imprinted gene and methods for detecting the presence of susceptibility to a medical condition associated with parent-of-origin dependent monoallelic expression in the subject.
An algorithm for quantifying polymorphisms in an electropherogram as disclosed in U.S. Patent Publication No. US 2011/0238318. This Publication discloses a method of quantifying particular target cite of a cell or organism by performing genomic sequencing in which the DNA sample is extracted from a cell or organism and after being treated to convert cytosine to uracil and a fragment of the treated DNA is amplified. A sequence analysis is then performed from an electropherogram from which calculations can be made to perform a sequence analysis.
U.S. Patent Publication No. US 2010/0285980 discloses a gene expression profile algorithm that provides a test for likelihood of recurrence of colorectal cancer and response to chemotherapy that involves analysis of gene expression values of prognostic and/or predictive genes. A biological sample can be obtained from a cancer patient. The measure of the expression levels to provide such information is analyzed and other methods of analysis are disclosed to identify genes that co-express with a validated biomarker that may be substituted by that biomarker in an assay is also disclosed.