The genes of individuals code for a variety of proteins. The expression of a gene in messenger Ribonucleic Acid (mRNA) and protein contributes to a variety of phenotypic traits (i.e., observable traits such as eye color, hair color, etc.) as well as other traits. If a variant occurs in a specific gene, that variation is reflected in mRNA and protein, which can result in a different phenotype. Genetic factors therefore play a major role in a variety of phenotypic traits. For example, normal variations (polymorphisms) in two genes, EDAR and FGFR2, have been associated with differences in hair thickness. Each variation in the nucleotides found in a gene (or the nucleotides that regulate expression of that gene) may be referred to as a genetic variant.
While biological inheritance of physical traits has been studied for decades, associating specific phenotypes with specific genetic variants or combinations thereof remains a complicated process. The human genome itself occupies approximately eighty Gigabytes (GB) of data. Furthermore, there are estimated to be roughly ten million Single Nucleotide Polymorphisms (SNPs) within the genome. Large stretches of the genome include non-coding regions (e.g., introns) as well as coding regions (e.g., exons), and the non-coding regions may regulate how one or more coding regions are expressed. Thus, even variations in non-coding regions may have an impact on phenotype, and false positives may occur when associating a genetic variant with a specific phenotype. Hence, the process of correlating specific genetic variants with specific traits (e.g., specific phenotypes) can be fiendishly complex.
Further compounding the process, it is not possible to identify many traits of an individual without studying the individual closely, and some traits may be hard to precisely quantify (e.g., hair curl, personality, etc.). Some traits may be hard to identify based on the information currently known about the individual. For example, an individual who has constant headaches may be suffering from high blood pressure, high stress, allergies, or other conditions. Without more information, it would be impossible to determine which genetic variants exist within that individual that are correlated with (and/or contribute to) the reported traits or symptoms.
Still further complicating this process, combinations of one or more traits may be linked with one or more genetic variants. Such many-to-many associations between traits and genetic variants remain hard to identify. Hence, those who seek to identify relationships between traits of individuals and the genetic variants found in those individuals continue to seek out enhanced systems and methods for achieving these goals.