Soybean, a legume, has become the world's primary source of seed oil and seed protein. In addition, its utilization is being expanded to the industrial, manufacturing and pharmaceutical sectors. Soybean productivity is a vital agricultural and economic consideration. Unfortunately, soybean is host to one of the widest ranges of infectious pathogens of all crops. More than a hundred different pathogens are known to affect soybean plants, some of which pose significant economic threats. Improving soybean disease tolerance to these many pathogens is crucial to preventing yield losses.
Charcoal Rot and Drought
Charcoal rot is caused by the fungus Macrophomina phaseolina. The fungus has a particularly wide geographic distribution and is found throughout the world. M. phaseolina is most severe between 35° North and 35° South latitude (Wyllie, (1976) ‘Macrophomina phaseolina—Charcoal Rot’ P482-484 In L. D. Hill (ed.) World Soybean Research Proc of the World Soybean Res. Conf., Champaign, Ill. Interstate, Danville, Ill.). The fungus also has a wide host range and infects over 500 crop and weed species and is highly variable. Known major crop hosts include alfalfa, maize, cotton, grain sorghum, peanut and soybean.
Symptoms of charcoal rot on soybean can appear during any growth stage. Infected seeds germinate, but usually die within a few days. The fungus also invades seedlings and may or may not exhibit symptoms, but serves as latent sources of inoculum for later in the growing season. The most common charcoal rot symptoms appear later in the season. Initially, disease plants exhibit smaller leaflet size, reduced height, and wilting. Ultimately, M. phaseolina can reduce plant height, root volume, and root weight by more than 50%. These deleterious effects on roots are most evident during the pod formation and seed filling stages, when demand for water is high. Affected plants mature several weeks earlier than normal and seed weight, number, and quality are reduced (Smith and Wyllie (1999) ‘Charcoal rot’ In G. L. Hartman (ed.) Compendium of soybean diseases. 4th ed. APS Press, St. Paul, Minn.).
High ambient temperatures and low water availability exacerbate charcoal rot symptoms in soybean. Thus, charcoal rot is primarily known as a dry weather or drought induced disease. Symptoms caused by M. phaseolina are often attributed drought stress.
In localized areas, yield losses can be as high as 90%. In the period from 1996-2005, charcoal rot was the third leading cause of soybean yield loss in the U.S. Average annual losses were 29 MM bushels resulting in approximately $188 MM annual income loss. Only soybean cyst nematode and phythophthora root rot caused greater economic loss during that period (Wrather and Koenning (2006) ‘Soybean Disease Loss Estimates for the United States, 1996-2006’. University of Missouri—Columbia Agriculture Experiment Station. November 2006 published online (http://) at: aes.missouri.edu/delta/research/soyloss.stm Dec. 5, 2007).
Complete or vertical resistance to M. phaseolina has not been identified in soybean, which strongly suggests that a single gene conferring resistance does not exist. In most field and greenhouse evaluations, the great majority of soybean cultivars have been found to be either highly or moderately susceptible to M. phaseolina. Only a few cultivars have been identified as possessing partial or horizontal resistance (Smith and Carville (1997) ‘Field screening of commercial and experimental soybean cultivars for their reaction to Macrophomina phaseolina’ Plant Dis 81:804-809).
An alternative approach to identifying complete resistance is to identify plants that show phenotypic tolerance to a particular pathogen. Tolerance can be described as the relative ability of a plant to survive infection without showing severe symptoms such as death, stunting, loss of vigor or yield loss. Tolerance includes any mechanism other than whole-plant immunity or resistance that reduces the expression of symptoms indicative of infection. Infected plants that exhibit tolerance will yield nearly as well as uninfected plants. However, phenotypic selection requires pathogenic infection which has many advantages.
The development of molecular genetic markers has facilitated mapping and selection of agriculturally important traits in soybean. Markers tightly linked to disease tolerance genes are an asset in the rapid identification of tolerant soybean lines on the basis of genotype by the use of marker assisted selection (MAS). Introgressing disease tolerance genes into a desired cultivar would also be facilitated by using suitable DNA markers.
Molecular Markers and Marker Assisted Selection
A genetic map is a graphical representation of a genome (or a portion of a genome such as a single chromosome) where the distances between landmarks on the chromosome are measured by the recombination frequencies between the landmarks. A genetic landmark can be any of a variety of known polymorphic markers, for example but not limited to, molecular markers such as SSR markers, RFLP markers, or SNP markers. Furthermore, SSR markers can be derived from genomic or expressed nucleic acids (e.g., ESTs). The nature of these physical landmarks and the methods used to detect them vary, but all of these markers are physically distinguishable from each other (as well as from the plurality of alleles of any one particular marker) on the basis of polynucleotide length and/or sequence.
Although specific DNA sequences which encode proteins are generally well-conserved across a species, other regions of DNA (typically non-coding) tend to accumulate polymorphism, and therefore, can be variable between individuals of the same species. Such regions provide the basis for numerous molecular genetic markers. In general, any differentially inherited polymorphic trait (including nucleic acid polymorphism) that segregates among progeny is a potential marker. The genomic variability can be of any origin, for example, insertions, deletions, duplications, repetitive elements, point mutations, recombination events or the presence and sequence of transposable elements. A large number of soybean molecular markers are known in the art, and are published or available from various sources, such as the SOYBASE internet resource. Similarly, numerous methods for detecting molecular markers are also well-established.
The primary motivation for developing molecular marker technologies from the point of view of plant breeders has been the possibility to increase breeding efficiency through marker assisted selection (MAS). A molecular marker allele that demonstrates linkage disequilibrium with a desired phenotypic trait (e.g., a quantitative trait locus, or QTL, such as resistance to a particular disease) provides a useful tool for the selection of a desired trait in a plant population. The key components to the implementation of this approach are: (i) the creation of a dense genetic map of molecular markers, (ii) the detection of QTL based on statistical associations between marker and phenotypic variability, (iii) the definition of a set of desirable marker alleles based on the results of the QTL analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made.
The availability of integrated linkage maps of the soybean genome containing increasing densities of public soybean markers has facilitated soybean genetic mapping and MAS. See, e.g., Cregan, et al., (1999) “An Integrated Genetic Linkage Map of the Soybean Genome” Crop Sci 39:1464-1490; Song, et al., (2004) “A New Integrated Genetic Linkage Map of the Soybean” Theor Appl Genet 109:122-128; Diwan and Cregan (1997) “Automated sizing of fluorescent-labeled simple sequence repeat (SSR) markers to assay genetic variation in Soybean” Theor Appl Genet 95:220-225; the SOYBASE resources on the world wide web, including the Shoemaker Lab Home Page and other resources that can be accessed through SOYBASE; and see, the Soybean Genomics and Improvements Laboratory (SGIL) website on the world wide web, and see especially the Cregan Lab webpage.
Two types of markers are frequently used in marker assisted selection protocols, namely simple sequence repeat (SSR, also known as microsatellite) markers, and single nucleotide polymorphism (SNP) markers. The term SSR refers generally to any type of molecular heterogeneity that results in length variability, and most typically is a short (up to several hundred base pairs) segment of DNA that consists of multiple tandem repeats of a two or three base-pair sequence. These repeated sequences result in highly polymorphic DNA regions of variable length due to poor replication fidelity, e.g., caused by polymerase slippage. SSRs appear to be randomly dispersed through the genome and are generally flanked by conserved regions. SSR markers can also be derived from RNA sequences (in the form of a cDNA, a partial cDNA or an EST) as well as genomic material.
The characteristics of SSR heterogeneity make them well suited for use as molecular genetic markers; namely, SSR genomic variability is inherited, is multiallelic, codominant and is reproducibly detectable. The proliferation of increasingly sophisticated amplification-based detection techniques (e.g., PCR-based) provides a variety of sensitive methods for the detection of nucleotide sequence heterogeneity. Primers (or other types of probes) are designed to hybridize to conserved regions that flank the SSR domain, resulting in the amplification of the variable SSR region. The different sized amplicons generated from an SSR region have characteristic and reproducible sizes. The different sized SSR amplicons observed from two homologous chromosomes in an individual, or from different individuals in the plant population are generally termed “marker alleles.” As long as there exists at least two SSR alleles that produce PCR products with at least two different sizes, the SSRs can be employed as a marker.
Soybean markers that rely on single nucleotide polymorphisms (SNPs) are also well known in the art. Various techniques have been developed for the detection of SNPs, including allele specific hybridization (ASH; see, e.g., Coryell, et al. (1999) “Allele specific hybridization markers for soybean,” Theor Appl Genet 98:690-696). Additional types of molecular markers are also widely used, including but not limited to expressed sequence tags (ESTs) and SSR markers derived from EST sequences, restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD) and isozyme markers. A wide range of protocols are known to one of skill in the art for detecting this variability, and these protocols are frequently specific for the type of polymorphism they are designed to detect. For example, PCR amplification, single-strand conformation polymorphisms (SSCP) and self-sustained sequence replication (3SR; see, Chan and Fox, (1999) “NASBA and other transcription-based amplification methods for research and diagnostic microbiology,” Reviews in Medical Microbiology 10:185-196).
Linkage of one molecular marker to another molecular marker is measured as a recombination frequency. In general, the closer two loci (e.g., two SSR markers) are on the genetic map, the closer they lie to each other on the physical map. A relative genetic distance (determined by crossing over frequencies, measured in centimorgans; cM) is generally proportional to the physical distance (measured in base pairs, e.g., kilobase pairs [kb] or megabase pairs [Mbp]) that two linked loci are separated from each other on a chromosome. A lack of precise proportionality between cM and physical distance can result from variation in recombination frequencies for different chromosomal regions, e.g., some chromosomal regions are recombinational “hot spots,” while others regions do not show any recombination, or only demonstrate rare recombination events. In general, the closer one marker is to another marker, whether measured in terms of recombination or physical distance, the more strongly they are linked. In some aspects, the closer a molecular marker is to a gene that encodes a polypeptide that imparts a particular phenotype (disease tolerance), whether measured in terms of recombination or physical distance, the better that marker serves to tag the desired phenotypic trait.
Genetic mapping variability can also be observed between different populations of the same crop species, including soybean. In spite of this variability in the genetic map that may occur between populations, genetic map and marker information derived from one population generally remains useful across multiple populations in identification of plants with desired traits, counter-selection of plants with undesirable traits and in guiding MAS.
QTL Mapping
It is the goal of the plant breeder to select plants and enrich the plant population for individuals that have desired traits, for example, pathogen tolerance, leading ultimately to increased agricultural productivity. It has been recognized for quite some time that specific chromosomal loci (or intervals) can be mapped in an organism's genome that correlate with particular quantitative phenotypes. Such loci are termed quantitative trait loci, or QTL. The plant breeder can advantageously use molecular markers to identify desired individuals by identifying marker alleles that show a statistically significant probability of co-segregation with a desired phenotype (e.g., pathogenic infection tolerance), manifested as linkage disequilibrium. By identifying a molecular marker or clusters of molecular markers that co-segregate with a quantitative trait, the breeder is thus identifying a QTL. By identifying and selecting a marker allele (or desired alleles from multiple markers) that associates with the desired phenotype, the plant breeder is able to rapidly select a desired phenotype by selecting for the proper molecular marker allele (a process called marker-assisted selection, or MAS). The more molecular markers that are placed on the genetic map, the more potentially useful that map becomes for conducting MAS.
Multiple experimental paradigms have been developed to identify and analyze QTL (see, e.g., Jansen, (1996) Trends Plant Sci 1:89). In this study we utilized “Intergroup Allele Frequency Distribution” analysis using GeneFlow™ version 7.0 software. An intergroup allele frequency distribution analysis provides a method for finding non-random distributions of alleles between two phenotypic groups.
During processing, a contingency table of allele frequencies is constructed and from this a G-statistic and probability are calculated (the G statistic is adjusted by using the William's correction factor). The probability value is adjusted to take into account the fact that multiple tests are being done (thus, there is some expected rate of false positives). The adjusted probability is proportional to the probability that the observed allele distribution differences between the two classes would occur by chance alone. The lower that probability value, the greater the likelihood that the Charcoal Rot infection phenotype and the marker will co-segregate. A more complete discussion of the derivation of the probability values can be found in the GeneFlow™ version 7.0 software documentation. See, also, Sokal and Rolf, (1981), Biometry: The Principles and Practices of Statistics in Biological Research, 2nd ed., San Francisco, W.H. Freeman and Co.
The underlying logic is that markers with significantly different allele distributions between the tolerant and susceptible groups (i.e., non random distributions) might be associated with the trait and can be used to separate them for purposes of marker assisted selection of soybean lines with previously uncharacterized tolerance or susceptibility. The present analysis examined one marker locus at a time and determined if the allele distribution within the tolerant group is significantly different from the allele distribution within the susceptible group. A statistically different allele distribution is an indication that the marker is linked to a locus that is associated with reaction to the trait of interest. In this analysis, unadjusted probabilities less than one are considered significant (the marker and the phenotype show linkage disequilibrium), and adjusted probabilities less than approximately 0.05 are considered highly significant. Allele classes represented by less than 5 observations across both groups were not included in the statistical analysis. In addition, in this study we utilized “Trait Allele Frequency Analysis” using GeneFlow™ version 7.0 software. For the Trait Allele Correlation report you must select accessions, markers and a single trait. For each allele at each selected marker, the report will show you the effect of having 0, 1 or 2 doses of that allele on the trait of interest. For each dosage comparison it calculates a t-statistic, probability and adjusted probability by comparing the means of two different dosage classes. The adjusted probability gives you a better idea of the experiment-wise significance given the number of alleles being tested, and is calculated as P_adj=(1−((1−Prob)**n)) where n is the number of tests being done in this analysis (see, Experimental Design: Procedures for the Behavioral Sciences). A more complete discussion of the derivation of the probability values can be found in the GeneFlow version 7.0 software documentation. See also, Sokal and Rolf, (1995) Biometry, 3rd ed., San Francisco, W.H. Freeman and Co.
There is a need in the art for improved soybean strains that are tolerant to Charcoal Rot and its causative agents, namely Macrophomina phaseolina infection and low-available water growth conditions. There is a need in the art for methods that identify soybean plants or populations (germplasm) that display tolerance to Charcoal Rot Drought Complex. What is needed in the art is to identify molecular genetic markers that are linked to Charcoal Rot Drought Complex tolerance loci in order to facilitate MAS. Such markers can be used to select individual plants and plant populations that show favorable marker alleles in soybean populations and then employed to select the tolerant phenotype, or alternatively, be used to counterselect plants or plant populations that show a Charcoal Rot Drought Complex susceptibility phenotype. The present invention provides these and other advantages.