Genetic researchers often need to study specific genetic variants to understand their significance. For example, researchers may be interested in knowing whether a certain genetic variant of interest (VOI) (e.g., having G/C allele at location 150 on Chromosome 3) is correlated with a particular phenotype expression (e.g., having a particular disease). Currently, the interpretation of specific genetic variants and identification of cohorts with such variants, particularly variants of unknown significance (VUS) from whole-genome sequence data, pose substantial challenges in genetics studies. VUS are so named because their correlations with specific phenotypes (e.g., certain diseases) are unknown prior to the studies. VUS are often too rare to be amenable to genome-wide association studies and thus traditionally have been interpreted with reference to the primary literature (especially for high-penetrance or Mendelian mutations) or by computational methods (e.g., Sorting Intolerant From Tolerant (SIFT), PolyPhen).
Some large personal genomic information database in existence can include individuals who actually possess the genetic variants of interest (VOI). For example, 23andMe, Inc., a personal genetics service company, has accumulated a large database comprising data of over 250,000 individuals. The large databases typically employ genotype data comprising genetic markers to represent an individual's genome, instead of using full sequence data. Because the genotype data is usually obtained using chips that have specific probes assaying selective locations on the genome, the data is typically not a full sequence and the VOI is not necessarily directly assayed (for example, an individual's assayed genotype data may not include specific information about the person's allele at location 150 on Chromosome 3 because the chip used for assaying does not have a probe at that location), making it difficult to study the VOI by directly using information stored in the large databases.