This invention relates to a process of predicting the value of a phenotypic numerically representable trait in a plant. A process of the present invention uses a quantitative assessment of the distribution of the numerically representable trait and a genotypic database in a first population to define an association between genotype and phenotype and predict the value of the phenotype in the same or a different plant.
Phenotypic traits of agronomic interest tend to be quantitative and continuously distributed. For most such traits, the quantitative distributions, resulting when values of individual plants are graphed against their relative frequency, fit that expected from segregation of alleles at a large number of loci, with each locus (a position on a chromosome) contributing a relatively small amount to the value of the phenotype. This is the polygenic model of inheritance. Another assumption of the basic model is that heritable gene action is additive; that is, each allele contributes some predictably inheritable amount to the total quantitative value of the phenotype. Environmental and unpredictable heritable factors are then superimposed as variation on the genotypic sources of variation to yield the phenotypic variance component of the continuous distribution of the trait. Distributions may be described by moments, notably the mean and variance (or the standard deviation which is the square root of the variance).
Before the polygenic model was developed, the observation of continuous distributions of traits initially posed an obstacle to the universal application of Mendelian theory. This apparent conflict was resolved by proposing the polygenic model for the inheritance of traits that could be described quantitatively (e.g., corn ear length, number of kernels, plant height, yield in bushels). The model proposed an underlying segregation of single genetic entities, thereby being consistent with Mendelian theory. However, because the effects of these individual genes were aggregated in the expression of the quantitative phenotypes, their individual effects could not be teased out. (Johannsen, 1909; Nilsson-Ehle, 1909; East, 1916; Fisher, 1918). Environmental variation further smoothed the distribution, masking boundaries between distinct genotypic classes.
The polygenic model has been used in attempts to enhance selection efficacy in plant breeding programs. By observation and careful measurements of results of various parental-offspring distributions, both in plants and animals, and by expressing the genetic relationships in mathematical correlations, a complex mathematical theory emerged (Fisher; 1918; Falconer, 1960; Wright, 1968, 1977).
A basic tenant of this theory is the expression of phenotypic distribution in terms of its variance and to dissect that variance into its causative components. By studying the variance in offspring distributions where the offspring result from various types of crosses, and by determining the correlation between phenotypic distributions of different pedigree relationships (parent-offspring, offspring of the same cross, subsequent generations, e.g., F2-F3) it was determined that the phenotypic variance (VP) had as basic components genotypic variance (VG) and environmental variance (VE). In a simple case, the variance of plants of the same genotype grown in different environments provides an estimate of the effects of environment. Factors contributing to the environmental variance include year of growth and differences in the soil composition of plots of land.
In turn, each of these components could be further subdivided, for example, by separating VG into additive (VA), dominance (VD) and epistatic (VI) components. The components of the variance could be estimated by breeding experiments. These values were then used to predict results of other breeding crosses. Response to selection was found to be a function of the heritability of the trait, the selection differential and the intensity of selection.
The heritability (h2) of a trait is broadly defined as             h      2        =                  V        G                    V        P              ,
or more narrowly,       h    2    =            V      A              V      P      
and is a predictor of the degree to which values of traits may be transmitted from parents to offspring.
The intensity of selection is defined as the percent of the distribution from which the parents of the next generation are derived. The selection differential is defined as the difference between the trait in the parental population versus that of the selected parents. The cost effectiveness of selection is determined by the amount of time (in generations) required to achieve a significant change in the distribution of the trait under selection, the number of parents selected for breeding, and the response to selection. The response to selection is the difference between parental and offspring means after selection, e.g., after a generation of selective breeding. The basic mathematical formula to predict gain from selection is as follows:
expected gain=(selection differential)xc3x97(heritability)
where selection differential is the mean value of the phenotypic trait in the selected individuals minus the overall parental population mean, and heritability is the proportion of the phenotypic variance that is due to additive genetic variance.
A change of the population mean brought about by selection, i.e., the response to selection, depended on the heritability of a trait and on the intensity of selection (selection differential). This variable depends on the proportion of the population selected, and the standard deviation of the phenotypic trait. The shaded area of the distributions of FIG. 2A, FIG. 2B and FIG. 2C are the proportion selected, and S is the selection differential.
Despite concerted attempts to improve commercially important phenotypic traits in plants, the rate of improvement of those traits has been only a few to several percent of the mean per year for the past several decades. In many previously described breeding programs, plants are selected as parents of the next generation on the basis of one or more phenotypic traits (e.g., yield in bushels per acre, number of rows per kernel of corn, percentage of grain oil).
A problem associated with selection based on phenotype is the affect of the environment on that phenotype. For various crop plants, it has been established that roughly half of improvement is due to improved husbandry practices, i.e., environmental effects rather than genetic changes effected by selection. (Lande and Thompson, 1990). For example, over the past 60 years, increases in yield due to genetic improvement have averaged only about one bushel/acre/year (Hallauer, et al., 1988, p. 466). Only a small population of hybrid plants produced commercially ever show enough improvement to be worth marketing. Environmental variables which need to be taken into account include soil type and the amount and distribution of rainfall. One of the important and influential environmental conditions is the temperature range of the climate in which the plants are grown. The time period needed by the plants to reach maturity (growth period) is under genotypic control. For optimum growth, the genotypically based growth period of the plant must fit within the environmental range. For example, if the plant does not fulfill its reproductive potential before the temperature drops below a threshold, the plant will not produce seed or offspring in that environment. Comparison of plants for various traits is typically made among plants of similar or identical maturity.
Another problem of phenotype selection is polygenic control. Most phenotypic waits of commercial interest are under polygenic, rather than single locus, control. This means that expression of alleles at many loci contribute to the phenotype of interest. Polygenically controlled traits, therefore, are not solely determined by any particular locus. Consequently, selecting on the basis of phenotype is a superficial and inefficient strategy. Complex genetic phenomenon lurk underneath the phenotypic facade. A tortuous, rather than a direct path, links the phenotype and genotype. At best, previous methods for predicting progeny performance were based only on filial relationships, displayed diminished effectiveness when applied to contiguous generations, and were available only on a population basis, not for individual plants.
The basic concept of selection described above has been applied in specific breeding schemes. Success has been a function of how well inheritance of a trait fits the assumptions of the polygenic model, and of the factors discussed above. An important application of these polygenic models was in selective breeding programs aimed at channeling the values of the phenotype toward one end or the other of its distribution. Selection entails choosing a sample of potential parents, the sample being based on the value of the plants for the traits being selected.
An example of a plant for which selective breeding has been practiced is corn (Sprague and Eberhart, 1977). Until this century, only crude mass selection was practiced; each ear of corn was harvested separately, and the most desirable ears were used to plant the ensuing crop. Information on the effectiveness of this method was almost completely lacking, although variation among races and varieties of corn existing at the turn of the century suggested some effect (Sprague and Eberhart 1977). One of the most commercially important traits, xe2x80x9cyield,xe2x80x9d was found to be the least amenable to change. Unexpectedly, this is a trait for which a process of the present invention has proved to be efficacious.
The next progression in selective breeding was xe2x80x9cear-to-row selectionxe2x80x9d in which the progeny of selected ears were separately evaluated by field planting and assessing the phenotypic distribution of the resulting plants. If a trait of interest was controlled by a few genes, genetic effects were not masked by environmental variation and a large number of plants could be grown for evaluation of phenotypic traits. Selective breeding to concentrate desirable genes in a population subsample, would be relatively straightforward. In reality, there were many problems attending selection of most traits of commercial interest. Exemplary such problems were: 1) control by a large number of genes (polygenic); 2) genetic effects were masked by the environment; 3) a complicated system of genetic interactions existed; and 4) the methods of isolating and evaluating lines were not wholly adequate.
Many recurrent selection methods and techniques have been proposed to improve breeding populations. Their general theme is to repeatedly select plants based on their apparently superior phenotypes and to interbreed these to form a new improved population. The assumption is that the frequency of alleles underlying the superior phenotypes will increase in frequency due to selection. Because a large number of loci are believed to control important commercially desirable phenotypic traits (e.g., yield), classical selection leads to very slow changes in the mean and genetic variance of this trait. Gene frequencies change gradually because each locus has only a small aggregate affect on the phenotypic trait as a whole.
Gain from selection can theoretically be enhanced by increasing the selection intensity (which means selecting a small elite percentage of the population as parents). However, there are risks in this approach. Key genetic factors may be excluded if the sample is too small, or deleterious effects of inbreeding may appear.
The trials and tribulations of breeders who used some of the selection methods to improve yield in corn are presented by Sprague and Eberhart (1977). Superiority (heterosis) of hybrid plants resulting from some crosses was noted even before the modern genetic theory provided insights regarding this phenomenon. Heterosis likely resulted from partial to complete dominance, overdominance, epistasis, or some combination of these phenomena. If partial to complete dominance predominates, a possibility exists for eventual development of stable, high-yielding homozygous genotypes. If overdominance or certain types of epistasis predominate, however, the highest yielding genotypes must be heterozygotes.
In one aspect, the present invention provides a process for predicting the value of a numerically representable phenotypic trait in a plant of a given species. That process comprises the steps of:
(a) forming a phenotypic trait database in a first plant population of the species by quantitatively assessing the distribution of a numerically representable phenotypic trait in the first plant population;
(b) forming a first genotypic database in the first plant population by genotyping members of the first plant population for one or more inherited genetic markers;
(c) evaluating the phenotypic trait database in conjunction with the first genotypic database to define an association between the numerically representable phenotypic trait and the inherited genetic marker(s);
(d) forming a second genotypic database in a second plant population of the species by genotyping members of the second plant population for the inherited genetic marker; and
(e) predicting the values of the numerically representable phenotypic trait of all members of the second plant population using the association and the second genotypic database.
In a preferred embodiment, the association is a regression equation where the numerically representable phenotypic trait is a dependent variable and the inherited genetic markers are independent variables. In another embodiment, the first and the second plant populations are both derived from the same seminal F1 hybrid, the second plant population being at the same or an advanced generation as the first plant.
Preferably, quantitatively assessing comprises testcrossing to obtain progeny and quantitatively assessing the distribution in combining ability of the first plant population or making direct observations on progeny derived by artificial or natural self-pollination of the first plant.
A preferred numerically representable phenotypic trait is yield, stalk strength, root strength, disease resistance, insect resistance, grain oil content, grain protein content, grain starch content, or grain moisture content and a preferred plant species is Zea mays or Glycine may. The inherited genetic marker is preferably inherited in codominant fashion.
In another aspect, the present invention provides a process for predicting the value of a numerically representable phenotypic trait in a plant of a given species, which process comprises the steps of:
(a) initiating a population of plant lines from a seminal F1 hybrid;
(b) evaluating a first population of plant lines at a generation subsequent to that of the seminal F1 hybrid to quantitatively assess the distribution of the numerically representable phenotypic trait of the first plant population;
(c) genotyping members of the first plant population for one or more inherited genetic markers;
(d) estimating the regression in the first plant population of the numerically representable phenotypic trait on at least one of the inherited genetic markers to develop a regression equation;
(e) genotyping members of a second plant population of the species, the second plant population being derived from the same or a different F1 seminal hybrid, the second plant population at the same or an advanced generation relative to the first plant; and
(f) predicting in the second plant population the values of the numerically representable phenotypic trait of all members using the regression equation.
In particular embodiments, evaluating comprises testcrossing to obtain progeny and thereby evaluating the distribution of combining ability of the first plant population or making direct observations on progeny of members of the first plant population derived by artificial or natural self-pollination; the plant line is initiated from the seminal F1 hybrid by self-fertilization; the regression equation defines the numerically representable phenotypic trait as a dependent variable and the inherited genetic marker as an independent variable; the numerically representable phenotypic trait is yield, stalk strength, root strength, disease resistance, insect resistance, grain oil content, grain protein content, grain starch, or grain moisture content; the inherited genetic marker is inherited in codominant fashion; the species is Zea mays or Glycine may; and the first and the second plant are both derived from the same seminal F1 hybrid, the second plant being at the same or an advanced generation as the first plant.
In still yet another aspect, the present invention provides a process of predicting the value of a numerically representable phenotypic trait in a plant of a given species, comprising the steps of:
(a) providing a progenitor plant population by self-pollinating hybrid plants;
(b) testcrossing members of a first plant progenitor population to determine the distribution of combining ability in the first population as expressed values of the numerically representable phenotypic trait in members of the first plant population;
(c) genotyping members of the first plant population to determine a distribution of first genetic marker fingerprints;
(d) expressing an association by a regression equation between the distributions of the numerically representable phenotypic trait and the first genetic marker fingerprint converted to a numerical score;
(e) genotyping a member of a second population of the same or a different progenitor population, the second plant being at a generation at or advanced beyond the generation of the first population to determine a second genetic marker fingerprint; and
(f) using the regression equation and the second genetic marker fingerprint converted to a numerical score to predict the value of the numerically representable phenotypic trait in the second plant.
In a still further aspect, the present invention provides a process for predicting the value of a numerically representable phenotypic trait in a plant of a given species, which process comprises the steps of:
(a) determining an association between at least one genetic marker and at least one numerically representable phenotypic trait in a first plant population at a first descendant generation of a first seminal F1 hybrid plant;
(b) genotyping a second plant from a second descendant generation of the same or a different F1 hybrid plant, the second generation at or beyond the first; and
(c) predicting in the second plant the value of the numerically representable trait by applying an equation that defines an association between at least one of the genetic markers and the numerically representable phenotypic trait in the first plant population.
The present invention further provides a process of predicting the value of a phenotypic trait in a plant of a given species, comprising:
(a) identifying in a first plant population of the species one or more genetic markers that are linked to quantitative trait loci that affect the phenotypic trait;
(b) conducting single locus or interval mapping analysis to estimate in the first plant population additive effects ascribable to each locus or interval based on numerical values of the phenotypic trait;
(c) summing the additive effects;
(d) combining the summed additive effects with testcross or directly observed values to produce an index of genotypic merit;
(e) quantifying the genetic markers in members of a second plant population of the species, the second plant population at an advanced generation relative to the first plant population; and
(f) predicting in members of the second plant population the values of the phenotypic trait using the index.
In another aspect, the present invention provides a process of breeding plants comprising:
(a) determining the distribution of allelic variation in a first population of parent lines of a first hybrid plant population;
(b) determining an association in the first hybrid plant population between the genetic markers of the first population of parent lines and one of the phenotypic traits expressed by members of the first hybrid population;
(c) genotyping members of a second population of parent lines for the genetic markers; and
(d) selecting members of the second population of parent lines for breeding based on a predicted value of performance for one of the phenotypic traits using a regression equation with the genotypes of members of the second population of parent lines as prediction variables.