The following discussion is meant to aid in the understanding of the invention, but is not intended to, and is not admitted to, describe prior art to the invention.
Populations can vary considerably with respect to the number and frequency of genetic variants that they possess. Analogously, or consequently, individuals within such populations can vary considerably in terms of the composition of their genomes. Both of these facts contribute to the tremendous phenotypic variation exhibited among individuals within and between populations.
Various genetic systems and analyses have been used to assess the relationship between genes and phenotype (Lander & Schork, Science 265:2037-2048, 1994). Fundamental to such analyses is the assumption that the sample of individuals with (and/or without) a particular phenotype chosen for study is “homogenous” with respect to the cause of the phenotype (i.e. the individuals in the sample have (or don't have) the phenotype for some reason). When this is not the case, a relevant study of the relationship between the phenotype and its determinant(s) is unlikely to be successful. Assessment of the “similarity” of individuals in a sample with respect to genetic backgrounds and molecular profile may thus provide a useful measure of this homogeneity (Curnow, J. Agricul., Biol., and Environ Stat. 3:347-358, 1998). Such homogeneity assessment can be of value to any study, but depends on the identification of individuals with certain features based on some distinguishing genetic characteristics, such as forensic applications.
Analyses assessing the similarity in the genetic profiles of individuals have been pursued. For example, polymorphic microsatellites (primarily CA repeats) have been used to construct trees of human individuals that reflect their geographic origin (Bowcock et al., Nature 368:455-457, 1994), and to study the genetic variability within and between cattle breeds (Ciampolini, et al, J. Anim. Sci. 73:3259-3268, 1995). RFLP genotypes have been used to construct trees of individuals of different ethnicities (Mountain and Cavalli-Sforza, Am. J. Hum. Genet. 61:705-718, 1997). Random amplified polymorphic DNA (RAPD) markers have been used to compute genetic similarity coefficients (Lamboy, PCR Methods and Applications 4:31-37, 1994), and to compare phenotype and genotype in plants (Jasienski, et al, Heredity 78:176-181, 1997).
However, these analyses often rely on a priori knowledge of the groups to which the individuals belong. Many do not permit the determination in the absence of a priori knowledge of which, and to what degree, different populations may have contributed to the genetic variation within a pool or sample of individuals. However, in the large majority of cases, individuals sampled from a population represent an “admixture” of genes from several populations. These populations are reflected in the genetic profiles of individuals and hence can defy population segregation based on traditional markers such as skin color and/or self-reported ethnic affiliation. Therefore, methods of analysis are needed to accurately determine the existence of clusters of genetically similar individuals, absent phenotypic (ethnic, for example) information. As noted previously, knowledge of the homogeneity or heterogeneity of a population can be important under many circumstances including forensics and population-based studies.
In forensics, DNA fingerprinting requires the computation of ‘match probabilities’ between the suspect and the DNA obtained on a victim. Match probabilities are often computed relative to a database of non-suspect DNA. The utility of the DNA contributed by non-suspects will be influenced by the amount of genetic heterogeneity among the non-suspects (Jin & Chalraborty, Heredity 74:274-285, 1995; Sawyer et al, Am. J. Hum. Genet. 59:272-274, 1996; Tomsey et al., J. Forensic Sci. 44:385-388, 1999). Thus, determining the heterogeneity of the non-suspect population sample (on its own and compared with the DNA obtained on a victim) is important for a meaningful control.
In addition, many population-based studies, such as large clinical trials, case-control studies of disease risk factors, and gene mapping studies, assume that the populations under study are relatively homogenous genetically. When this assumption is erroneous, false inferences about the efficacy of a compound or the role of a particular risk factor in disease pathogenesis, for example, can result. Assessment of the heterogeneity of the population avoids misleading results.