This invention is in the field of the plant breeding and molecular biology. More specifically, the invention concerns a method to identify and use genetic markers that are diagnostic of plant genes conferring agronomic fitness to crop plants.
All crop species are grown for the purpose of harvesting some product of commercial significance. Enhancement of productivity or "yield" of that product is almost invariably a major goal of any plant breeding program. Yield is the final culmination of many distinguishable agronomic triats such as emergence vigor, vegetative vigor, disease resistance, seed set, standability, and threshability. Hence, the terms "yield" and "agronomics" are often used interchangeably. Obviously, yield is a quantitative (non-discrete) trait that is influenced by many genetic and environmental factors. The greatest barrier to progress in selection for quantitative traits, especially for yield, is the lack of repeatability of phenotypic traits in different environments. Although genetic differences in yield potential undoubtedly exist among individuals, environmental effects make it difficult to identify genetically superior individuals. Hence, identifying individuals with the most favorable genotype is one of the most difficult and challenging aspects of plant breeding.
The breeder uses two main strategies to reduce the effect of environment on selection of genetically superior crop plants. By comparing individuals in enough different environments one can obtain an average measure on phenotype, or, by developing methods to bypass environmental effects, one can obtain a direct measure of genotype. Methods to directly assay genotype are obviously preferred and exemplify the true art and science of plant breeding.
Although there is much speculation, the exact biochemical nature of genes affecting yield is largely unknown. This has made it very difficult to identify the exact quantitative trait loci (QTL's) that affect yield. However, it is possible to identify and monitor segregation of discrete (qualitative) genetic markers that are closely linked to QTL's. A "genetic marker" is any qualitatively inherited phenotype that can be used to monitor the segregation of alleles that are genetically linked to the marker. Genetic markers can, therefore, be used as a direct measure of genotype at a linked locus (e.g., a QTL) that may otherwise be difficult to score. Genetic markers include visual traits such as flower color, enzyme variants such as isozymes, blood groups (in animals), and molecular markers such as restriction fragment length polymorphisms (RFLP's) or randomly amplified polymorphic DNA (RAPD's).
In order for a QTL to be identified or mapped to a specific chromosome location, the geneticist must first demonstrate that the quantitative trait of interest is highly correlated with a genetic marker. This correlation is the basis for the assumption of genetic linkage between the marker and the QTL. The conventional approach to mapping QTL's involves making a cross between two plants that are genetically different for one or more characters of interest, and obtaining segregating progeny (commonly F2, backcross, or recombinant inbred lines) from the hybrid. A number of progeny (usually &gt;100) are evaluated for the character of interest and for their genotypes at marker loci at regular intervals (10-20 cM) throughout the genome. A search is then made for associations between the segregating markers and the character of interest. If such associations are found, they should be due to linkage of the marker to a gene(s) affecting the character.
Obviously, a key assumption of such conventional QTL analysis is that the quantitative trait phenotype in question can be measured with as little error and ambiguity as possible. However, individual measurements for traits such as yield are typically confounded with experimental error and environmental effects. Conventional mapping of QTL's for yield, therefore, requires costly and time consuming replicated yield testing of each segregating progeny over many environments so that each individual is assigned an average measure of phenotype that is reliable. Only then can meaningful correlations be made between yield genes and qualitative markers. Another major weakness of conventional QTL analysis is the fact that conclusions can only be made about genetic variation that exists within the segregating population that is being studied. This is extremely limiting for a trait such as yield since no sub-population will contain the myriad of yield genes available to the plant breeder. These two weaknesses are exemplified by two previous attempts to find genetic markers for yield genes.
Grant et al. (International Patent Application Number WO 89/07647, 1989) applied conventional QTL analysis to identify molecular markers that were diagnostic of yield and other specific agronomic traits that contribute to yield in maize. Segregants from the cross B73.times.Mo17 were evaluated for quantitative traits based on evaluation of F3 topcrosses and bulk F4 progenies derived from F2 plants. To determine phenotype, each F3 topcross or F4 bulk progeny was grown in two replications at each of four environments. Because genoytpe by environment interactions were observed for all traits, correlations between probes and quantitative traits had to be determined for each location separately. This means that correlations could only be based on two data points per segregant, and while statistically significant correlations between traits and markers were reported, there is no evidence that selection based on these markers is effective. Based on their limited phenotypic data, especially for yield, it is highly questionable whether meaningful correlations have been established.
In an earlier attempt to find genetic markers for grain yield in maize, Stuber et al. (Genetics 95: 225-236 (1980) and Crop Science 22: 737-740 (1982)) measured the frequency of alleles at 20 isozyme loci in two open-pollinated populations before and after recurrent selection for yield. They showed that changes in allele frequency at 8 such loci were associated with changes in grain yield that resulted from traditional selection based on yield. Such converse selection based on "favorable" isozyme alleles resulted in only slight yield gains, however, when compared to selection based on yield per se. When results were averaged over environments, marker-assisted selection resulted in yield progress of only 2 to 3% while selection based on yield per se resulted in approximately a 30% yield increase. These experiments exemplify the problems associated with obtaining reliable yield data, the limitation of conclusions to the two varieties of maize being studied, and the difficulty of finding markers that are diagnostic of yield. Such results actually denigrate the assumption that significant yield progress can be accomplished through marker-assisted selection. The accuracy of Stuber et al.'s statistical methods are highly dependent on the practice of randomly mating selected individuals during each cycle of recurrent selection. In practice, it is difficult to enforce a mating system that is truely random. This is a serious limitation of conventional population genetic studies.
A key feature of the current invention is a population genetic study that employs genetic markers to measure allele frequency differences between modern-day elite lines and their earliest known ancestors. Since Applicants' statistical analyses are calibrated with known pedigrees, the invention can be used to study changes in allele frequency in populations developed through non-random matings (the predominant type of mating used to breed crop plants). The invention completely eliminates the need to collect exhaustively replicated yield data or other quantitative data from segregating populations. Instead of relying on data collected from specific populations in specific environments, the current invention takes advantage of yield progress that has occured during the entire period that a crop has been domesticated. Indirectly, the invention relies on an extremely large pool of yield data that has already been collected through the past efforts of many plant breeders. Such data represents the performance of many different genotypes (allele combinations) over many different environments. Alleles that confer high yield over many environments have been favored by selection during the historical domestication of any crop plant. The observed frequency of favorable alleles in a collection of modern elite lines must, therefore, be greater than the frequency expected from random segregation of alleles from ancestors. The current invention takes advantage of differences between observed and expected allele frequency to identify alleles that affect yield thereby enabling the selection of high yielding progeny without exhaustive field testing. The invention also provides the opportunity to locate and clone alleles affecting yield in a positive manner. These alleles can then be used to transform existing crop plants to create new elite lines.