Multiple experimental paradigms have been developed to identify and analyze quantitative trait loci (QTL) (see, e.g., Jansen (1996) Trends Plant Sci 1:89). A quantitative trait locus (QTL) is a region of the genome that codes for one or more proteins and that explains a significant proportion of the variability of a given phenotype that may be controlled by multiple genes. The majority of published reports on QTL mapping in crop species have been based on the use of the bi-parental cross. Typically, these paradigms involve crossing one or more parental pairs, which can be, for example, a single pair derived from two inbred strains, or multiple related or unrelated parents of different inbred strains or lines, each of which exhibits different characteristics relative to the phenotypic trait of interest. Typically, this experimental protocol involves deriving 100 to 300 segregating progeny from a single cross of two divergent inbred lines (e.g., selected to maximize phenotypic and molecular marker differences between the lines). The parents and segregating progeny are genotyped for multiple marker loci and evaluated for one to several quantitative traits (e.g., disease resistance). QTL are then identified as significant statistical associations between genotypic values and phenotypic variability among the segregating progeny.
Numerous statistical methods for determining whether markers are genetically linked to a QTL (or to another marker) are known to those of skill in the art and include, e.g., standard linear models, such as ANOVA or regression mapping (Haley and Knott (1992) Heredity 69:315), maximum likelihood methods such as expectation-maximization algorithms, (e.g., Lander and Botstein (1989) Genetics 121:185-199; Jansen (1992) Theor. Appl. Genet., 85:252-260; Jansen (1993) Biometrics 49:227-231; Jansen (1994) In J. W. van Ooijen and J. Jansen (eds.), Biometrics in Plant breeding: applications of molecular markers, pp. 116-124, CPRO-DLO Netherlands; Jansen (1996) Genetics 142:305-311; and Jansen and Stam (1994) Genetics 136:1447-1455). Exemplary statistical methods include single point marker analysis, interval mapping (Lander and Botstein (1989) Genetics 121:185), composite interval mapping, penalized regression analysis, complex pedigree analysis, MCMC analysis, MQM analysis (Jansen (1994) Genetics 138:871), HAPLO-IM+ analysis, HAPLO-MQM analysis, and HAPLO-MQM+ analysis, Bayesian MCMC, ridge regression, identity-by-descent analysis, and Haseman-Elston regression.
Complex trait dissection in many species has largely relied on two main approaches, linkage analysis and association mapping (Andersson and Georges 2004, Nat. Rev. Genet. 5: 202-212; Flint et al. 2005, Nat. Rev. Genet. 6: 271-286; Hirschhorn and Daly 2005, Nat. Rev. Genet. 6: 95-108). While methods for linkage analysis using designed mapping populations have long been employed (Doerge 2002, Nat. Rev. Genet. 3: 43-52), methods for association mapping with population-based samples were more recently developed to overcome the hidden population structure or cryptic relatedness within collected samples (Falush et al. 2003, Genetics 164: 1567-1587; Yu et al. 2006, Nat. Genet. 38: 203-208). Statistical methods for joint linkage and linkage-disequilibrium mapping strategy have been studied for natural populations (Wu and Zeng 2001, Genetics 157: 899-909; Wu et al. 2002, Genetics 160: 779-792) and crossing an inbred to a heterogeneous stock has also been examined (Mott and Flint 2002, Genetics 160: 1609-1618). For a general complex pedigree, fine mapping via combining linkage and linkage-disequilibrium information at previously mapped QTL regions has identified candidate gene polymorphisms (Meuwissen et al. 2002, Genetics 161: 373-379; Blott et al. 2003, Genetics 163: 253-266). Previous studies of genetic designs with multiple line crosses have shown an improved power and mapping resolution over a single population (Rebai and Goffinet 1993, Genet. Res. 75: 243-247; Xu 1998, Genetics 148: 517-524; Rebai and Goffinet 2000, Genet. Res. 75: 243-247; Yi and Xu 2002, Genetica 114: 217-230; Jansen et al. 2003, Crop Sci. 43: 829-834; Li et al. 2005, Genetics 169: 1699-1709; Verhoeven et al. 2006, Heredity 96: 139-149). These studies, however, exploited mainly the linkage information of multiple line crosses.
In the case of humans, the use of genetics to identify genes and pathways associated with traits follows a very standard paradigm. First, a genome-wide linkage study is performed using hundreds of genetic markers in family-based data to identify broad regions linked to the trait. The result of this standard sort of linkage analysis is the identification of regions controlling for the trait, thereby restricting attention from the 30,000 plus genes to perhaps as few as 500 to 1000 genes in a particular region of the genome that is linked to the trait. However, the regions identified using linkage analysis are still far too broad to identify candidate genes associated with the trait. Therefore, such linkage studies are typically followed up by fine mapping the regions of linkage using higher density markers in the linkage region, increasing the number of families in the analysis, and identifying alternative populations for study. These efforts further restrict attention to narrower regions of the genome, on the order of 100 genes in a particular region linked to the trait. Even with the more narrowly defined linkage region, the number of genes to validate is still unreasonably large. Therefore, research at this stage focuses on identifying candidate genes based on putative function of known or predicted genes in the region and the potential relevance of that function to the trait. This approach is problematic because it is limited to what is currently known about genes. Often, such knowledge is limited and subject to interpretation. As a result, researchers are often led astray and do not identify the genes affecting the trait.