Breeding has advanced from selection for economically important traits in plants and animals based on phenotypic records of an individual and its relatives to the application of molecular genetics to identify genomic regions that contain valuable genetic traits. Inclusion of genetic markers in breeding programs has accelerated the genetic accumulation of valuable traits into a germplasm compared to that achieved based on phenotypic data only. Herein, “germplasm” includes breeding germplasm, breeding populations, collection of elite inbred lines, populations of random mating individuals, and biparental crosses. Genetic marker alleles (an “allele” is an alternative sequence at a locus) are used to identify plants that contain a desired genotype at multiple loci, and that are expected to transfer the desired genotype, along with a desired phenotype to their progeny. Genetic marker alleles are used to identify plants that contain the desired genotype at one marker locus, several loci, or a haplotype, and that would be expected to transfer the desired genotype, along with a desired phenotype to their progeny. This process has been widely referenced and has served to greatly economize plant breeding by accelerating the fixation of advantageous alleles and also eliminating the need for phenotyping every generation.
Recent years have seen tremendous advances in the application of marker-assisted breeding techniques, on both the development of markers and the association of markers with phenotypes, or quantitative trait loci (QTL) mapping. Examples of DNA markers are Restriction Fragment Length Polymorphisms (RFLP), Amplified Fragment Length Polymorphisms (AFLP), Simple Sequence Repeats (SSR), Single Nucleotide Polymorphisms (SNP), Insertion/Deletion Polymorphisms (Indels), Variable Number Tandem Repeats (VNTR), and Random Amplified Polymorphic DNA (RAPD), and others known to those skilled in the art. Marker discovery and development in crops provides the initial framework for applications to marker-assisted breeding activities (U.S. Pat. No. 5,437,697; U.S. patent application Ser. Nos. 11/204,780, 11/216,545, 11/218,305, and 11/504,538). The resulting “genetic map” is the representation of the relative position of characterized loci (DNA markers or any other locus for which alleles can be identified) along the chromosomes. The measure of distance on this map is relative to the frequency of crossover events between sister chromatids at meiosis. As a set, polyallelic markers serve as a useful tool for fingerprinting plants to inform the degree of identity of lines or varieties (U.S. Pat. No. 6,207,367). These markers form the basis for determining associations with phenotype and can be used to drive genetic gain. The implementation of marker-assisted selection is dependent on the ability to detect underlying genetic differences between individuals.
Because of ALLELIC differences in these molecular markers, QTL can be identified by statistical evaluation of the genotypes and phenotypes of segregating populations. Processes to map QTL are well-described (WO 90/04651; U.S. Pat. Nos. 5,492,547, 5,981,832, 6,455,758; reviewed in Flint-Garcia et al. 2003 Ann. Rev. Plant Biol. 54:357-374). Using markers to infer phenotype in these cases results in the economization of a breeding program by substitution of costly, time-intensive phenotyping with genotyping. Further, breeding programs can be designed to explicitly drive the frequency of specific, favorable phenotypes by targeting particular genotypes (U.S. Pat. No. 6,399,855). Fidelity of these associations may be monitored continuously to ensure maintained predictive ability and, thus, informed breeding decisions (US Patent Application 2005/0015827).
This process has evolved to the application of markers as a tool for the selection of “new and superior plants” via introgression of preferred genomic regions as determined by statistical analyses (U.S. Pat. No. 6,219,964). Marker-assisted introgression involves the transfer of a chromosomal region, defined by one or more markers, from one germplasm to a second germplasm. The initial step in that process is the localization of the genomic region or transgene by gene mapping, which is the process of determining the position of a gene or genomic region relative to other genes and genetic markers through linkage analysis. The basic principle for linkage mapping is that the closer together two genes are on a chromosome, the more likely they are to be inherited together. Briefly, a cross is generally made between two genetically compatible but divergent parents relative to the traits of interest. Genetic markers can then be used to follow the segregation of these traits in the progeny from the cross, often a backcross (BC1), F2, or recombinant inbred population.
It is well recognized that common QTL mapping procedures provide low resolution placement of inferred QTL loci on the genetic map (e.g., Buntjer et al. 2005 Trends Plant Sci. 10:466-471; Morgante et al. 2003 Curr. Op. Biotech. 14:214-219). This is attributable to two, basic underlying facts. First, QTL identification is a low-power activity, requiring that information from a large number of progeny be leveraged to achieve a significant confidence that any observed differences in the expression of a quantitative trait amongst classes of progeny must be due to linkage of a trait locus to the genetic marker that provided the basis for DIFFERENTIATING classes of progeny. Second, the progeny generation usually employed in QTL mapping is of relatively recent derivation from the F1 generation, the point where genetic mechanisms could first act to allow linked alleles to begin the slow approach to linkage EQUILIBRIUM. The consequence of these two facts is that identified QTL can be placed only with a reasonable confidence of existing within a segment of DNA as large as 20-30 cM.
Further, other limitations of traditional QTL mapping research include the fact that inferences are restricted to the particular parents of the mapping population and the genes or gene combinations of these parental varieties. There has long been interest in extrapolating the QTL inferences BEYOND the original mapping population in an attempt to leverage the genetic insight to broad sets of germplasm, including elite and unimproved germplasm sources. However, there are a number of biological reasons why such broad inferences are likely to be invalid (Paterson 1995 Genome Res. 5:321-333; Slate 2005 Mol. Ecol. 14:363-379; Breseghello et al. 2006 Crop Sci. 46:1323-1330), with the major limitation being the lack of knowledge of identity by descent at a specific genomic region (Buntjer et al. 2005 Trends Plant Sci. 10:466-471).
It has long been recognized that genes and genomic sequences may be identical by state (i.e., identical by independent origins) or identical by descent (i.e., through historical inheritance from a common progenitor) which has tremendous bearing on studies of linkage disequilibrium and, ultimately, mapping studies (Nordberg et al. 2002 Trends Gen. 18:83-90). Historically, genetic markers were not appropriate for distinguishing identical in state or by descent. However, newer classes of markers, such as SNPs (single nucleotide polymorphisms), are more diagnostic of origin. The likelihood that a particular SNP allele is derived from independent origins in the extant populations of a particular species is very low. Polymorphisms occurring in linked genes are randomly assorted at a slow, but predictable rate, described by the decay of linkage disequilibrium or, alternatively, the approach of linkage equilibrium. Consequences of this well-established scientific discovery are that long stretches of coding DNA, defined by a specific combination of polymorphisms, are very unique and extremely improbable of existing in duplication except through linkage disequilibrium, which is indicative of recent co-ancestry from a common progenitor. The probability that a particular genomic region, as defined by some combination of alleles, indicates absolute identity of the entire intervening genetic sequence is dependent on the number of linked polymorphisms in this genomic region, barring the occurrence of recent mutations in the interval. Herein, such genomic regions are referred to as haplotype windows. Each haplotype within that window is defined by specific combinations of alleles; the greater the number of alleles, the greater the number of potential haplotypes, and the greater the certainty that identity by state is a result of identity by descent at that region. During the development of new lines, ancestral haplotypes are maintained through the process and are typically thought of as ‘linkage blocks’ that are inherited as a unit through a pedigree. Further, if a specific haplotype has a known effect, or phenotype, it is possible to extrapolate its effect in other lines with the same haplotype, as determined using one or more diagnostic markers for that haplotype window.
There have been contributions in the public domain around analyses to define haplotype blocks from a plurality of markers and the methodology is well known to anyone skilled in the art (e.g., U.S. Pat. Nos. 6,844,154; 6,909,971; 6,920,398; 6,969,589; 7,041,447). In human populations, statistical analyses, such as association studies, have been employed to determine haplotype-phenotype associations, which is useful for informing clinical decisions (Li et al. 2006 BMC Bioinformatics 7:258; U.S. Pat. Nos. 6,931,326; 6,969,589). In mice, the resolution of haplotype structure (Frazer et al. 2004 Genome Res. 14:1493-1500; Wiltshire et al. 2003 Proc. Natl. Acad. Sci. 100:3380-3385) has also enabled enhanced QTL mapping for inbred lines (Pletcher et al. 2004 PLoS Biol. 2:e393; McClurg et al. 2006 BMC Bioinformatics 7:61).
The present invention allows researchers to address the biological limitations of known methods of QTL mapping and incorporates pedigree information such that the invention enables an improved approach to predictive breeding, based on both an improved approach to traditional QTL mapping coupled with high density fingerprinting. This combination of information allows the correspondence of the deductive inferences about linkage between marker alleles and phenotype with the ability to reliably predict where the same parental linkages exist elsewhere in the germplasm pool. Thus, the present invention provides a means to predict across a broad group of germplasm, comprising multiple populations, where the prior inferences of genotype-phenotype associations are applicable. Further, the present invention allows such inferences to be made for multiple traits, a key feature lacking in previous inventions.
In another aspect, there is a need in the art of plant breeding to identify haplotypes beyond the context of specific traits or regions. In the present invention, haplotype windows are defined across the genome in order to enable comparisons between two or more haplotypes within and between windows, wherein the haplotypes are associated with one or more traits to establish an estimated effect. As a result, haplotypes associated with improved performance with respect to an phenotypic trait or multiple traits are targeted for selection and it is possible to then select for these genomic regions simultaneously. Assessing haplotypes at a genome level generates a greater density of haplotypes and facilitates the identification of preferred haplotypes that might be overlooked with smaller-scale haplotype analyses. Herein, the traits may be nontransgenic or transgenic in nature.
The present invention allows one skilled in the art to estimate haplotype effects using associations, based on historical data or de novo mapping, between genetic markers and one or more phenotypic traits. In conjunction with haplotype frequencies, haplotype effect estimates can also be used to calculate haplotype breeding values for a group of haplotypes. In the context of a specified set of haplotypes, a calculated set of breeding values can be used to ranking haplotypes both within and between windows. In the context of evaluating the effect of substituting a specific region in the genome, either by introgression or a transgenic event, haplotype breeding values provide for comparing haplotypes across windows for substitution effects. Both rankings of haplotype effects and breeding values allow one skilled in the art to make selections for the purpose of germplasm improvement activities.