The genes of individuals code for a variety of proteins. The expression of a gene in messenger Ribonucleic Acid (mRNA) and protein contributes to a variety of phenotypic traits (i.e., observable traits such as eye color, hair color, etc.) as well as other traits. If a variant occurs in a specific gene, that variation is reflected in mRNA and protein, which can result in a different phenotype. Genetic factors therefore play a major role in a variety of phenotypic traits. For example, normal variations (polymorphisms) in two genes, EDAR and FGFR2, have been associated with differences in hair thickness. Each variation in the nucleotides found in a gene (or the nucleotides that regulate expression of that gene) may be considered a genetic variant.
While biological inheritance of physical traits has been studied for decades, associating specific phenotypes with specific genetic variants or combinations thereof remains a complicated process. The human genome itself occupies approximately eighty Gigabytes (GB) of data. Furthermore, there are estimated to be roughly ten million Single Nucleotide Polymorphisms (SNPs) within the genome. Large stretches of the genome include non-coding regions (e.g., introns) as well as coding regions (e.g., exons), and the non-coding regions may regulate how one or more coding regions are expressed. Thus, even variations in non-coding regions may have an impact on phenotype, and false positives may occur when associating a genetic variant with a specific phenotype. Hence, the process of correlating specific genetic variants with specific traits (e.g., specific phenotypes) can be fiendishly complex.
Further increasing the complexity of the process, it is not possible to identify many traits of an individual without studying the individual closely, and some traits may be hard to precisely quantify (e.g., hair curl, personality, etc.). Some traits may be hard to identify based on the information currently known about the individual. For example, an individual who has constant headaches may be suffering from high blood pressure, high stress, allergies, or other conditions. Without more information, it would be impossible to determine which genetic variants exist within that individual that are correlated with (and/or contribute to) the reported traits or symptoms.
Mathematical models have been built which attempt to predict the traits of an individual based on the genetic sequence of an individual. However, the accuracy, speed, and complexity of such models varies wildly. Even models that are accurate for the general population may produce less accurate predictions when applied to members of certain subpopulations, due to genetic variation or other factors which may not have been captured in the original model. Furthermore, individuals may be unwilling to share the amount and type of genetic data desired as input for the models discussed above.
Hence, those who seek to identify generalizable and robust relationships between traits of individuals and the genetic variants found in those individuals continue to seek out enhanced methods for achieving these goals.