Soybean, a legume, has become the world's primary source of seed oil and seed protein. In addition, its utilization is being expanded to the industrial, manufacturing and pharmaceutical sectors. Soybean productivity is a vital agricultural and economic consideration. Improving soybean tolerance to diverse and/or adverse growth conditions is crucial for maximizing yields.
Iron Deficiency Chlorosis
Iron-deficiency chlorosis (IDC; alternatively, FEC), reduces soybean yields, particularly on calcareous or other high pH soils. IDC develops in soybean due to a lack of chlorophyll in the leaves of affected plants, manifesting as yellowing on the leaves. Iron is required for the synthesis of chlorophyll and, although iron is sufficiently present in most soils, it is often in an insoluble form that cannot be used by the plant. Iron deficiency occurs in soils due to high pH, high salt content, cool temperatures or other environmental factors that decrease iron solubility. Studies have shown that even mild IDC symptoms are an indication that yield is being negatively affected (Fehr (1982) Journal of Plant Nutrition, 611-621.)
Iron is found in soil mainly as insoluble oxyhydroxide polymers (FeOOH) that are extremely insoluble (10−17 M) at neutral pH. Since the optimal concentration of soluble Fe for plant growth is approximately 10−6 M, plants have evolved two different strategies to mine the iron they need from soil (Fox and Guerinot 1998 “Molecular biology of cation transport in plants,” Annu. Rev. Plant Physiol. Plant Mol. Biol. 49:669-96).
So-called “Strategy I” is used by all plants except grasses. This strategy involves a two step process. In the first step, the oxidized iron Fe(III) is reduced to the more soluble Fe(II) by a membrane-bound ferric chelate reductase located in root epidermal cells. This reductase activity is inducible and necessary for iron uptake under iron deficient conditions (Yi and Guerinot (1996) Plant Journal 10:835-844). A gene FRO2 that encodes such a ferric chelate reductase enzyme has been identified and sequenced in Arabidopsis (Robinson et al, Nature 397:694-697, 1999). Following the reduction step, a separate transport protein is required to move the reduced iron across the root plasma membrane. A gene IRT1 (iron regulated transporter) which codes for the transport protein has also been found in Arabidopsis (Eide et al, PNAC 93:5624-5628). This same transport protein has been shown to transport manganese, zinc, and cobalt as well (Korshunova et al, Plant Mol. Biology 40:37-44, 1999). In addition to this two step process, Strategy I plants also acidify the soil by exuding protons from the roots via the conversion of ATP to ADP within the roots. This lowers the pH in the rhizosphere and makes the iron oxides more soluble.
While iron availability can, to an extent, be modulated environmentally (e.g., by modifying soil pH or adding soluble iron, applying foliar iron treatments, or applying iron to seed), these approaches can cause unwanted side effects in the soybean or the environment and also add to soybean production costs. Some treatments, such as iron treatment of seed, display inconsistent results in different cultivars or field environments. Despite these difficulties, most producers currently rely on the use of seed, foliar, or soil treatments to reduce IDC (Weirsma (2002) “Iron Deficiency Chlorosis (IDC) In Soybean,” Cropping Issues in Northwest Minnesota 1(7): 1-2); Goos and Germain (2001) “Solubility of Twelve Iron Fertilizer Products in Alkaline Soils” Communications in Soil Science and Plant Analysis 32:2317-2323.
For some time, soybean producers have sought to develop IDC tolerant plants as a cost-effective alternative or supplement to standard foliar, soil and/or seed treatments (e.g., Hintz et al. (1987) “Population development for the selection of high-yielding soybean cultivars with resistance to iron deficiency chlorosis,” Crop Sci. 28:369-370). Recent studies also suggest that cultivar selection is more reliable and universally applicable than foliar sprays or iron seed treatment methods, though environmental and cultivar selection methods can also be used effectively in combination. See also, Goos and Johnson (2000) “A Comparison of Three Methods for Reducing Iron-Deficiency Chlorosis in Soybean” Agronomy Journal 92:1135-1139; and Goos and Johnson “Seed Treatment, Seeding Rate, and Cultivar Effects on Iron Deficiency Chlorosis of Soybean” Journal of Plant Nutrition 24 (8) 1255-1268.
The advent of molecular genetic markers has facilitated mapping and selection of agriculturally important traits in soybean. Markers tightly linked to disease tolerance genes are an asset in the rapid identification of tolerant soybean lines on the basis of genotype by the use of marker assisted selection (MAS). Introgres sing disease tolerance genes into a desired cultivar would also be facilitated by using suitable DNA markers.
Soybean cultivar improvement for IDS tolerance can be performed using classical breeding methods, or, more preferably, using marker assisted selection (MAS). Genetic markers for IDC tolerance/susceptibility have been identified (e.g., Lin et al. (2000) “Molecular characterization of iron deficiency chlorosis in soybean” Journal of Plant Nutrition 23:1929-1939). Recent work suggests that marker assisted selection is particularly beneficial when selecting plants for IDC tolerance, because the strength of environmental effects on chlorosis expression impedes progress in improving IDC resistance. See also, Charlson et al., “Associating SSR Markers with Soybean Resistance to Iron Chlorosis,” Journal of Plant Nutrition, vol. 26, nos. 10 & 11; 2267-2276 (2003).
Molecular Markers and Marker Assisted Selection
A genetic map is a graphical representation of a genome (or a portion of a genome such as a single chromosome) where the distances between landmarks on the chromosome are measured by the recombination frequencies between the landmarks. A genetic landmark can be any of a variety of known polymorphic markers, for example but not limited to, molecular markers such as SSR markers, RFLP markers, or SNP markers. Furthermore, SSR markers can be derived from genomic or expressed nucleic acids (e.g., ESTs). The nature of these physical landmarks and the methods used to detect them vary, but all of these markers are physically distinguishable from each other (as well as from the plurality of alleles of any one particular marker) on the basis of polynucleotide length and/or sequence.
Although specific DNA sequences which encode proteins are generally well-conserved across a species, other regions of DNA (typically non-coding) tend to accumulate polymorphism, and therefore, can be variable between individuals of the same species. Such regions provide the basis for numerous molecular genetic markers. In general, any differentially inherited polymorphic trait (including nucleic acid polymorphism) that segregates among progeny is a potential marker. The genomic variability can be of any origin, for example, insertions, deletions, duplications, repetitive elements, point mutations, recombination events, or the presence and sequence of transposable elements. A large number of soybean molecular markers are known in the art, and are published or available from various sources, such as the SOYBASE internet resource. Similarly, numerous methods for detecting molecular markers are also well-established.
The primary motivation for developing molecular marker technologies from the point of view of plant breeders has been the possibility to increase breeding efficiency through marker assisted selection (MAS). A molecular marker allele that demonstrates linkage disequilibrium with a desired phenotypic trait (e.g., a quantitative trait locus, or QTL, such as resistance to a particular disease) provides a useful tool for the selection of a desired trait in a plant population. The key components to the implementation of this approach are: (i) the creation of a dense genetic map of molecular markers, (ii) the detection of QTL based on statistical associations between marker and phenotypic variability, (iii) the definition of a set of desirable marker alleles based on the results of the QTL analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made.
The availability of integrated linkage maps of the soybean genome containing increasing densities of public soybean markers has facilitated soybean genetic mapping and MAS. See, e.g., Cregan et al. (1999) “An Integrated Genetic Linkage Map of the Soybean Genome” Crop Sci. 39:1464-1490; Song et al., “A New Integrated Genetic Linkage Map of the Soybean,” Theor. Appl. Genet., 109:122-128 (2004); Diwan and Cregan (1997) “Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in Soybean,” Theor. Appl. Genet., 95:220-225; the SOYBASE resources on the world wide web, including the Shoemaker Lab Home Page and other resources that can be accessed through SOYBASE; and see the Soybean Genomics and Improvements Laboratory (SGIL) on the world wide web.
Two types of markers are frequently used in marker assisted selection protocols, namely simple sequence repeat (SSR, also known as microsatellite) markers, and single nucleotide polymorphism (SNP) markers. The term SSR refers generally to any type of molecular heterogeneity that results in length variability, and most typically is a short (up to several hundred base pairs) segment of DNA that consists of multiple tandem repeats of a two or three base-pair sequence. These repeated sequences result in highly polymorphic DNA regions of variable length due to poor replication fidelity, e.g., caused by polymerase slippage. SSRs appear to be randomly dispersed through the genome and are generally flanked by conserved regions. SSR markers can also be derived from RNA sequences (in the form of a cDNA, a partial cDNA or an EST) as well as genomic material.
The characteristics of SSR heterogeneity make them well suited for use as molecular genetic markers; namely, SSR genomic variability is inherited, is multiallelic, codominant and is reproducibly detectable. The proliferation of increasingly sophisticated amplification-based detection techniques (e.g., PCR-based) provides a variety of sensitive methods for the detection of nucleotide sequence heterogeneity. Primers (or other types of probes) are designed to hybridize to conserved regions that flank the SSR domain, resulting in the amplification of the variable SSR region. The different sized amplicons generated from an SSR region have characteristic and reproducible sizes. The different sized SSR amplicons observed from two homologous chromosomes in an individual, or from different individuals in the plant population are generally termed “marker alleles.” As long as there exists at least two SSR alleles that produce PCR products with at least two different sizes, the SSRs can be employed as a marker.
Soybean markers that rely on single nucleotide polymorphisms (SNPs) are also well known in the art. Various techniques have been developed for the detection of SNPs, including allele specific hybridization (ASH; see, e.g., Coryell et al., (1999) “Allele specific hybridization markers for soybean,” Theor. Appl. Genet., 98:690-696). Additional types of molecular markers are also widely used, including but not limited to expressed sequence tags (ESTs) and SSR markers derived from EST sequences, restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD) and isozyme markers. A wide range of protocols are known to one of skill in the art for detecting this variability, and these protocols are frequently specific for the type of polymorphism they are designed to detect. For example, PCR amplification, single-strand conformation polymorphisms (SSCP) and self-sustained sequence replication (3SR; see Chan and Fox, “NASBA and other transcription-based amplification methods for research and diagnostic microbiology,” Reviews in Medical Microbiology 10:185-196 [1999]).
Linkage of one molecular marker to another molecular marker is measured as a recombination frequency. In general, the closer two loci (e.g., two SSR markers) are on the genetic map, the closer they lie to each other on the physical map. A relative genetic distance (determined by crossing over frequencies, measured in centimorgans; cM) is generally proportional to the physical distance (measured in base pairs, e.g., kilobase pairs [kb] or megabasepairs [Mbp]) that two linked loci are separated from each other on a chromosome. A lack of precise proportionality between cM and physical distance can result from variation in recombination frequencies for different chromosomal regions, e.g., some chromosomal regions are recombinational “hot spots,” while others regions do not show any recombination, or only demonstrate rare recombination events. In general, the closer one marker is to another marker, whether measured in terms of recombination or physical distance, the more strongly they are linked. In some aspects, the closer a molecular marker is to a gene that encodes a polypeptide that imparts a particular phenotype (disease tolerance), whether measured in terms of recombination or physical distance, the better that marker serves to tag the desired phenotypic trait.
Genetic mapping variability can also be observed between different populations of the same crop species, including soybean. In spite of this variability in the genetic map that may occur between populations, genetic map and marker information derived from one population generally remains useful across multiple populations in identification of plants with desired traits, counter-selection of plants with undesirable traits and in guiding MAS.
QTL Mapping
It is the goal of the plant breeder to select plants and enrich the plant population for individuals that have desired traits, for example, pathogen tolerance, leading ultimately to increased agricultural productivity. It has been recognized for quite some time that specific chromosomal loci (or intervals) can be mapped in an organism's genome that correlate with particular quantitative phenotypes. Such loci are termed quantitative trait loci, or QTL. The plant breeder can advantageously use molecular markers to identify desired individuals by identifying marker alleles that show a statistically significant probability of co-segregation with a desired phenotype (e.g., pathogenic infection tolerance), manifested as linkage disequilibrium. By identifying a molecular marker or clusters of molecular markers that co-segregate with a quantitative trait, the breeder is thus identifying a QTL. By identifying and selecting a marker allele (or desired alleles from multiple markers) that associates with the desired phenotype, the plant breeder is able to rapidly select a desired phenotype by selecting for the proper molecular marker allele (a process called marker-assisted selection, or MAS). The more molecular markers that are placed on the genetic map, the more potentially useful that map becomes for conducting MAS.
Multiple experimental paradigms have been developed to identify and analyze QTL (see, e.g., Jansen (1996) Trends Plant Sci 1:89). The majority of published reports on QTL mapping in crop species have been based on the use of the bi-parental cross (Lynch and Walsh (1997) Genetics and Analysis of Quantitative Traits, Sinauer Associates, Sunderland). Typically, these paradigms involve crossing one or more parental pairs, which can be, for example, a single pair derived from two inbred strains, or multiple related or unrelated parents of different inbred strains or lines, which each exhibit different characteristics relative to the phenotypic trait of interest. Typically, this experimental protocol involves deriving 100 to 300 segregating progeny from a single cross of two divergent inbred lines (e.g., selected to maximize phenotypic and molecular marker differences between the lines). The parents and segregating progeny are genotyped for multiple marker loci and evaluated for one to several quantitative traits (e.g., disease resistance). QTL are then identified as significant statistical associations between genotypic values and phenotypic variability among the segregating progeny. The strength of this experimental protocol comes from the utilization of the inbred cross, because the resulting F1 parents all have the same linkage phase. Thus, after selfing of the F1 plants, all segregating progeny (F2) are informative and linkage disequilibrium is maximized, the linkage phase is known, there are only two QTL alleles, and, except for backcross progeny, the frequency of each QTL allele is 0.5.
Numerous statistical methods for determining whether markers are genetically linked to a QTL (or to another marker) are known to those of skill in the art and include, e.g., standard linear models, such as ANOVA or regression mapping (Haley and Knott (1992) Heredity 69:315), maximum likelihood methods such as expectation-maximization algorithms, (e.g., Lander and Botstein (1989) “Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps,” Genetics 121:185-199; Jansen (1992) “A general mixture model for mapping quantitative trait loci by using molecular markers,” Theor. Appl. Genet., 85:252-260; Jansen (1993) “Maximum likelihood in a generalized linear finite mixture model by using the EM algorithm,” Biometrics 49:227-231; Jansen (1994) “Mapping of quantitative trait loci by using genetic markers: an overview of biometrical models,” In J. W. van Ooijen and J. Jansen (eds.), Biometrics in Plant breeding: applications of molecular markers, pp. 116-124, CPRO-DLO Netherlands; Jansen (1996) “A general Monte Carlo method for mapping multiple quantitative trait loci,” Genetics 142:305-311; and Jansen and Stam (1994) “High Resolution of quantitative trait into multiple loci via interval mapping,” Genetics 136:1447-1455). Exemplary statistical methods include single point marker analysis, interval mapping (Lander and Botstein (1989) Genetics 121:185), composite interval mapping, penalized regression analysis, complex pedigree analysis, MCMC analysis, MQM analysis (Jansen (1994) Genetics 138:871), HAPLO-IM+analysis, HAPLO-MQM analysis, and HAPLO-MQM+ analysis, Bayesian MCMC, ridge regression, identity-by-descent analysis, Haseman-Elston regression, any of which are suitable in the context of the present invention. In addition, additional details regarding alternative statistical methods applicable to complex breeding populations which can be used to identify and localize QTLs are described in: U.S. Ser. No. 09/216,089 by Beavis et al. “QTL MAPPING IN PLANT BREEDING POPULATIONS” and PCT/US00/34971 by Jansen et al. “MQM MAPPING USING HAPLOTYPED PUTATIVE QTLS ALLELES: A SIMPLE APPROACH FOR MAPPING QTLS IN PLANT BREEDING POPULATIONS.” Any of these approaches are computationally intensive and are usually performed with the assistance of a computer based system and specialized software. Appropriate statistical packages are available from a variety of public and commercial sources, and are known to those of skill in the art.
There is a need in the art for improved soybean strains that are tolerant to iron-deficient growth conditions. There is a need in the art for methods that identify soybean plants or populations (germplasm) that display tolerance to iron-deficient growth conditions. What is needed in the art is to identify molecular genetic markers that co-segregate with to low-iron tolerance loci (e.g., tolerance QTL) in order to facilitate MAS, and also to facilitate gene discovery and cloning of gene alleles that impart tolerance to low iron growth conditions. Such markers can be used to select individual plants and plant populations that show favorable marker alleles in soybean populations and then employed to select the tolerant phenotype, or alternatively, be used to counterselect plants or plant populations that show a low-iron susceptibility phenotype. The present invention provides these and other advantages.