Soybean, a legume, has become the world's primary source of seed oil and seed protein. In addition, its utilization is being expanded to the industrial, manufacturing and pharmaceutical sectors. Soybean productivity is a vital agricultural and economic consideration. Unfortunately, soybean is host to one of the widest ranges of infectious pathogens of all crops. More than a hundred different pathogens are known to affect soybean plants, some of which pose significant economic threats. Improving soybean disease tolerance to these many pathogens is crucial to preventing yield losses.
Phytophthora Root Rot
One of the most destructive fungal diseases of soybean [Glycine max (L.) Merr.] is Phytophthora root rot (PRR). This disease is caused by a pathogenic infection of Phytophthora sojae Kaufmann and Gerdemann (synonymous with P. megasperma Drechs. f. sp. Glycinea T. Kaun & D. C. Erwin). The disease is principally characterized by root rot, but also demonstrates pre-emergence death. The fungal infection can kill plants at all stages of growth and can reduce stands. Infected plants that survive will demonstrate a stunted, less vigorous condition with reduced yields. Complete discussion of Phytophthora infection, as well as historical and more recent changes in Phytophthora taxonomy are described in various sources, e.g., see the world wide web at phytid.org maintained by CABI Bioscience (Egham, UK), and see also Erwin and Ribeiro (1996), Phytophthora infestans (Mont.) de Bary (1876). Phytophthora Diseases Worldwide, p. 346-353, APS Press, St. Paul, Minn.
A common method of protecting soybean plants from Phytophthora root rot infections utilizes the selection of specific resistance genes. Plant breeders manipulate specific resistance genes in an attempt to produce plants that are resistant to infection, or limit the extent of the infection. Unfortunately, as breeders evolve increasingly resistant strains, the host range of the pathogen similarly evolves to adapt to the changing genetic constitution of the host. Thus, resistant soybean strains produced by plant breeders are effective only for a finite period and eventually fail.
An alternative approach is to identify plants that show tolerance to a particular pathogen. Tolerance can be described as the relative ability of a plant to survive infection without showing severe symptoms such as death, stunting, loss of vigor or yield loss. Tolerance includes any mechanism other than whole-plant immunity or resistance that reduces the expression of symptoms indicative of infection. Infected plants that exhibit tolerance will yield nearly as well as uninfected plants and also prevent the evolution of host-adapted pathogenic Phytophthora races capable of reducing soybean yield in previously resistant plants.
The development of molecular genetic markers has facilitated mapping and selection of agriculturally important traits in soybean. Markers tightly linked to disease tolerance genes are an asset in the rapid identification of tolerant soybean lines on the basis of genotype by the use of marker assisted selection (MAS). Introgressing disease tolerance genes into a desired cultivar would also be facilitated by using suitable DNA markers.
Molecular Markers and Marker Assisted Selection
A genetic map is a graphical representation of a genome (or a portion of a genome such as a single chromosome) where the distances between landmarks on the chromosome are measured by the recombination frequencies between the landmarks. A genetic landmark can be any of a variety of known polymorphic markers, for example but not limited to, molecular markers such as SSR markers, RFLP markers, or SNP markers. Furthermore, SSR markers can be derived from genomic or expressed nucleic acids (e.g., ESTs). The nature of these physical landmarks and the methods used to detect them vary, but all of these markers are physically distinguishable from each other (as well as from the plurality of alleles of any one particular marker) on the basis of polynucleotide length and/or sequence.
Although specific DNA sequences which encode proteins are generally well-conserved across a species, other regions of DNA (typically non-coding) tend to accumulate polymorphism, and therefore, can be variable between individuals of the same species. Such regions provide the basis for numerous molecular genetic markers. In general, any differentially inherited polymorphic trait (including nucleic acid polymorphism) that segregates among progeny is a potential marker. The genomic variability can be of any origin, for example, insertions, deletions, duplications, repetitive elements, point mutations, recombination events, or the presence and sequence of transposable elements. A large number of soybean molecular markers are known in the art, and are published or available from various sources, such as the SOYBASE internet resource. Similarly, numerous methods for detecting molecular markers are also well-established.
The primary motivation for developing molecular marker technologies from the point of view of plant breeders has been the possibility to increase breeding efficiency through marker assisted selection (MAS). A molecular marker allele that demonstrates linkage disequilibrium with a desired phenotypic trait (e.g., a quantitative trait locus, or QTL, such as resistance to a particular disease) provides a useful tool for the selection of a desired trait in a plant population. The key components to the implementation of this approach are: (i) the creation of a dense genetic map of molecular markers, (ii) the detection of QTL based on statistical associations between marker and phenotypic variability, (iii) the definition of a set of desirable marker alleles based on the results of the QTL analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made.
The availability of integrated linkage maps of the soybean genome containing increasing densities of public soybean markers has facilitated soybean genetic mapping and MAS. See, e.g., Cregan et al. (1999) “An Integrated Genetic Linkage Map of the Soybean Genome” Crop Sci. 39:1464-1490; Song et al., “A New Integrated Genetic Linkage Map of the Soybean,” Theor. Appl. Genet., 109:122-128 (2004); Diwan and Cregan (1997) “Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in Soybean,” Theor. Appl. Genet., 95:220-225; the SOYBASE resources on the world wide web, including the Shoemaker Lab Home Page and other resources that can be accessed through SOYBASE; and see the Soybean Genomics and Improvements Laboratory (SGIL) website on the world wide web, and see especially the Cregan Lab webpage.
Two types of markers are frequently used in marker assisted selection protocols, namely simple sequence repeat (SSR, also known as microsatellite) markers, and single nucleotide polymorphism (SNP) markers. The term SSR refers generally to any type of molecular heterogeneity that results in length variability, and most typically is a short (up to several hundred base pairs) segment of DNA that consists of multiple tandem repeats of a two or three base-pair sequence. These repeated sequences result in highly polymorphic DNA regions of variable length due to poor replication fidelity, e.g., caused by polymerase slippage. SSRs appear to be randomly dispersed through the genome and are generally flanked by conserved regions. SSR markers can also be derived from RNA sequences (in the form of a cDNA, a partial cDNA or an EST) as well as genomic material.
The characteristics of SSR heterogeneity make them well suited for use as molecular genetic markers; namely, SSR genomic variability is inherited, is multiallelic, codominant and is reproducibly detectable. The proliferation of increasingly sophisticated amplification-based detection techniques (e.g., PCR-based) provides a variety of sensitive methods for the detection of nucleotide sequence heterogeneity. Primers (or other types of probes) are designed to hybridize to conserved regions that flank the SSR domain, resulting in the amplification of the variable SSR region. The different sized amplicons generated from an SSR region have characteristic and reproducible sizes. The different sized SSR amplicons observed from two homologous chromosomes in an individual, or from different individuals in the plant population are generally termed “marker alleles.” As long as there exists at least two SSR alleles that produce PCR products with at least two different sizes, the SSRs can be employed as a marker.
Soybean markers that rely on single nucleotide polymorphisms (SNPs) are also well known in the art. Various techniques have been developed for the detection of SNPs, including allele specific hybridization (ASH; see, e.g., Coryell et al., (1999) “Allele specific hybridization markers for soybean,” Theor. Appl. Genet., 98:690-696). Additional types of molecular markers are also widely used, including but not limited to expressed sequence tags (ESTs) and SSR markers derived from EST sequences, restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD) and isozyme markers. A wide range of protocols are known to one of skill in the art for detecting this variability, and these protocols are frequently specific for the type of polymorphism they are designed to detect. For example, PCR amplification, single-strand conformation polymorphisms (SSCP) and self-sustained sequence replication (3SR; see Chan and Fox, “NASBA and other transcription-based amplification methods for research and diagnostic microbiology,” Reviews in Medical Microbiology 10:185-196 [1999]).
Linkage of one molecular marker to another molecular marker is measured as a recombination frequency. In general, the closer two loci (e.g., two SSR markers) are on the genetic map, the closer they lie to each other on the physical map. A relative genetic distance (determined by crossing over frequencies, measured in centimorgans; cM) is generally proportional to the physical distance (measured in base pairs, e.g., kilobase pairs [kb] or megabasepairs [Mbp]) that two linked loci are separated from each other on a chromosome. A lack of precise proportionality between cM and physical distance can result from variation in recombination frequencies for different chromosomal regions, e.g., some chromosomal regions are recombinational “hot spots,” while others regions do not show any recombination, or only demonstrate rare recombination events. In general, the closer one marker is to another marker, whether measured in terms of recombination or physical distance, the more strongly they are linked. In some aspects, the closer a molecular marker is to a gene that encodes a polypeptide that imparts a particular phenotype (disease tolerance), whether measured in terms of recombination or physical distance, the better that marker serves to tag the desired phenotypic trait.
Genetic mapping variability can also be observed between different populations of the same crop species, including soybean. In spite of this variability in the genetic map that may occur between populations, genetic map and marker information derived from one population generally remains useful across multiple populations in identification of plants with desired traits, counter-selection of plants with undesirable traits and in guiding MAS.
QTL Mapping
It is the goal of the plant breeder to select plants and enrich the plant population for individuals that have desired traits, for example, pathogen tolerance, leading ultimately to increased agricultural productivity. It has been recognized for quite some time that specific chromosomal loci (or intervals) can be mapped in an organism's genome that correlate with particular quantitative phenotypes. Such loci are termed quantitative trait loci, or QTL. The plant breeder can advantageously use molecular markers to identify desired individuals by identifying marker alleles that show a statistically significant probability of co-segregation with a desired phenotype (e.g., pathogenic infection tolerance), manifested as linkage disequilibrium. By identifying a molecular marker or clusters of molecular markers that co-segregate with a quantitative trait, the breeder is thus identifying a QTL. By identifying and selecting a marker allele (or desired alleles from multiple markers) that associates with the desired phenotype, the plant breeder is able to rapidly select a desired phenotype by selecting for the proper molecular marker allele (a process called marker-assisted selection, or MAS). The more molecular markers that are placed on the genetic map, the more potentially useful that map becomes for conducting MAS.
Multiple experimental paradigms have been developed to identify and analyze QTL (see, e.g., Jansen (1996) Trends Plant Sci 1:89). The majority of published reports on QTL mapping in crop species have been based on the use of the bi-parental cross (Lynch and Walsh (1997) Genetics and Analysis of Quantitative Traits, Sinauer Associates, Sunderland). Typically, these paradigms involve crossing one or more parental pairs, which can be, for example, a single pair derived from two inbred strains, or multiple related or unrelated parents of different inbred strains or lines, which each exhibit different characteristics relative to the phenotypic trait of interest. Typically, this experimental protocol involves deriving 100 to 300 segregating progeny from a single cross of two divergent inbred lines (e.g., selected to maximize phenotypic and molecular marker differences between the lines). The parents and segregating progeny are genotyped for multiple marker loci and evaluated for one to several quantitative traits (e.g., disease resistance). QTL are then identified as significant statistical associations between genotypic values and phenotypic variability among the segregating progeny. The strength of this experimental protocol comes from the utilization of the inbred cross, because the, resulting F1 parents all have the same linkage phase. Thus, after selfing of the F1 plants, all segregating progeny (F2) are informative and linkage disequilibrium is maximized, the linkage phase is known, there are only two QTL alleles, and, except for backcross progeny, the frequency of each QTL allele is 0.5.
Numerous statistical methods for determining whether markers are genetically linked to a QTL (or to another marker) are known to those of skill in the art and include, e.g., standard linear models, such as ANOVA or regression mapping (Haley and Knott (1992) Heredity 69:315), maximum likelihood methods such as expectation-maximization algorithms, (e.g., Lander and Botstein (1989) “Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps,” Genetics 121:185-199; Jansen (1992) “A general mixture model for mapping quantitative trait loci by using molecular markers,” Theor. Appl. Genet., 85:252-260; Jansen (1993) “Maximum likelihood in a generalized linear finite mixture model by using the EM algorithm,” Biometrics 49:227-231; Jansen (1994) “Mapping of quantitative trait loci by using genetic markers: an overview of biometrical models,” In J. W. van Ooijen and J. Jansen (eds.), Biometrics in Plant breeding: applications of molecular markers, pp. 116-124, CPRO-DLO Metherlands; Jansen (1996) “A general Monte Carlo method for mapping multiple quantitative trait loci,” Genetics 142:305-311; and Jansen and Stam (1994) “High Resolution of quantitative trait into multiple loci via interval mapping,” Genetics 136:1447-1455). Exemplary statistical methods include single point marker analysis, interval mapping (Lander and Botstein (1989) Genetics 121:185), composite interval mapping, penalized regression analysis, complex pedigree analysis, MCMC analysis, MQM analysis (Jansen (1994) Genetics 138:871), HAPLO-IM+ analysis, HAPLO-MQM analysis, and HAPLO-MQM+ analysis, Bayesian MCMC, ridge regression, identity-by-descent analysis, Haseman-Elston regression, any of which are suitable in the context of the present invention. In addition, additional details regarding alternative statistical methods applicable to complex breeding populations which can be used to identify and localize QTLs are described in: U.S. Ser. No. 09/216,089 by Beavis et al. “QTL MAPPING IN PLANT BREEDING POPULATIONS” and PCT/US00/34971 by Jansen et al. “MQM MAPPING USING HAPLOTYPED PUTATIVE QTLS ALLELES: A SIMPLE APPROACH FOR MAPPING QTLS IN PLANT BREEDING POPULATIONS.” Any of these approaches are computationally intensive and are usually performed with the assistance of a computer based system and specialized software. Appropriate statistical packages are available from a variety of public and commercial sources, and are known to those of skill in the art.
There is a need in the art for improved soybean strains that are tolerant to Phytophthora infection. There is a need in the art for methods that identify soybean plants or populations (germplasm) that display tolerance to Phytophthora infection. What is needed in the art is to identify molecular genetic markers that are linked to Phytophthora tolerance loci (e.g., tolerance QTL) in order to facilitate MAS, and also to facilitate gene discovery and cloning of gene alleles that impart Phytophthora infection tolerance. Such markers can be used to select individual plants and plant populations that show favorable marker alleles in soybean populations and then employed to select the tolerant phenotype, or alternatively, be used to counterselect plants or plant populations that show a Phytophthora infection susceptibility phenotype. The present invention provides these and other advantages.