Soybean, a legume, has experienced increasing importance in the world economy, and has become the world's primary source of seed oil and seed protein. Both people and livestock rely on soybeans as a food source. In addition, its utilization is being expanded to the industrial, manufacturing and pharmaceutical sectors. Soybean productivity, well-being and improvement are vital agricultural considerations.
Soybean is host to one of the widest ranges of infectious pathogens of all crops. Finding resistance to these many pathogens is crucial to preventing devastating yield losses. More than a hundred different pathogens are known to affect soybeans, and all parts of the plant are susceptible to disease. Of these documented pathogens, approximately 35 pose significant economic threats. It is rare to find a soybean field that is pathogen-free, and in most instances, plants are infected with multiple diseases.
Efforts to improve the soybean crop have benefited greatly by the evolution of plant genomics, and more specifically, genetic linkage maps and molecular marker technology. Plant genetic variability that can be detected at the molecular level has been a great benefit for crop improvement research. It has also permitted the direct manipulation of specific genes through cloning and transformation techniques.
Genetic Linkage Maps
A genetic map, also termed a linkage map, is a representation of a genome that shows the relative positions of specific DNA markers relative to each other. The construction of linkage maps is based solely on the ability to identify genetic markers. Any differentially inherited polymorphic trait that segregates among progeny is a potential marker. Linked markers are markers that are relatively close to each other on the genetic map, and as a result, are co-inherited with a characteristic non-random frequency (a frequency greater than 51%). The closer they lie to each other on the genetic map, the lower the likelihood they will independently segregate following crossing-over events, and the greater the likelihood the two markers will be co-inherited. This is the underlying principle used in all linkage determinations and by all computational programs to construct genetic linkage maps. A variety of programs for the analysis of mapping data are available, and include, for example, Mapmaker, MapManager, MultiMap and LINKAGE.
In general, the closer two markers are on the genetic map, the closer they lie to each other on the physical map. A relative genetic distance (determined by crossing over frequencies, measured in centimorgans; cM) is generally proportional to the physical distance (measured in base pairs [bp], e.g., kilobase pairs [kb] or megabase pairs [Mbp]) that two linker marker are separated on a linkage group (a chromosome).
Genetic linkage maps that produce tightly linked markers are beneficial to marker-assisted selection (MAS) breeding programs. In this technique, researchers employ markers (typically molecular markers) to improve the efficiency of selecting gene alleles that impart a beneficial trait, e.g., disease resistance. In any genetic analysis, including MAS, a genetic map that contains a certain number of makers is more useful than a genetic map that contains fewer markers. If a map can be saturated with a sufficient number of linked markers for traits of interest, then gene (trait) mapping and gene cloning, e.g., positional cloning, are facilitated.
Molecular Markers
Plant genetic variability that can be detected at the molecular level has been a great benefit for crop improvement research. These molecular markers can be categorized into two broad classes, namely, restriction fragment length polymorphisms (RFLPs) and microsatellites.
RFLP markers are hybridization-based molecular markers. RFLPs produce different size fragments when cleaved by restriction enzymes because of the variation in the DNA primary structure. These different size fragments are then resolved and detected using various gel-based assays, including Southern blotting using radioactive or non-radioactive labeled probes. RFLP genetic analysis is hindered by technical considerations including probe design, restriction enzyme choice and molecular weight of the segregating bands. Adding to these limitations is the low level of polymorphism detectable by RFLP techniques, requirement for larger amounts of genetic material, and result in poorer genetic resolution than methods that detect other types of heterogeneity (e.g., SSR-type microsatellite heterogeneity). RFLP analysis is labor-intensive and time consuming, and cost of this procedure can become prohibitively expensive when compared to other methods.
The term simple sequence repeat (SSR), or microsatellite, refer generally to short (typically up to several hundred base pairs) segment of DNA that consists of multiple tandem repeats of a two or three base-pair sequence. These repeated sequences result in DNA regions of variable length. These repetitive sequences demonstrate poor replication fidelity due to polymerase slippage, and result in highly polymorphic regions. Microsatellites appear to be randomly dispersed through the genome and are generally flanked by conserved regions. This genomic variability is inherited and reproducibly detectable. These characteristics of SSRs are well suited for amplification as PCR products, leading to their extensive development as molecular markers.
SSLP-type heterogeneity is generally heterogeneity caused by small insertions or deletions that result in changes in the length of the polymorphic region. In some cases, SSRs are a subset of SSLP heterogeneity, and can encompass any molecular event that alters the basepair length at a specific location in the DNA (resulting in polymorphism). The SSLP-type polymorphic region is identified and amplified with primers similar to SSR methods. Thus, the SSLP scheme includes, but is not limited to, SSR-type polymorphism. As used herein, reference to SSLP polymorphisms generally includes SSR-type polymorphisms.
The characteristics of microsatellite heterogeneity make them well suited for use a molecular genetic markers. The use of amplification-based detection techniques such as PCR has led to their extensive development as molecular markers. Microsatellite markers are generated as PCR amplicons that span polymorphic regions containing repeats or deletions/insertions, where the PCR primers lie in conserved domains that flank the microsatellite repeats. The PCR product corresponds to the length of the microsatellite region, and will produce PCR products with characteristic and reproducible sizes. Useful polymorphic microsatellite regions can include any mutational event that alters the length of the amplified sequence.
The proliferation of increasingly sophisticated amplification-based detection techniques provides a variety of sensitive methods for the detection of genetic variation at the nucleotide level. Primers or probes are designed for high levels of sequence specificity, which allows precise DNA regions of interest to be targeted. These types of molecular markers offer the potential for high throughput, increased efficiency and reduced expense.
The nature of polymorphism in SSRs gives SSR-based markers several distinct advantages over hybridization-based methods such as RFLP analysis. Most significantly, an SSR marker can detect multiple alleles, manifested as different sized PCR amplicons. As long as there exist at least two gene alleles that produce PCR products with two different sizes, the SSRs can be employed as a marker. The ability to visualize both parental bands in progeny also allows heterozygosity to be monitored, which is not possible when scoring is based on the presence or absence of a marker alone (as with most random amplified polymorphic DNA (RAPD) marker analysis).
Expressed Sequence Tags (ESTs) are cDNA clones that correspond to expressed mRNA. These sequences are termed tags because typically only a few hundred nucleotides are sequenced from the cDNA for identification purposes only. Human and mouse genomes have demonstrated the usefulness of ESTs in genetic linkage map construction and map-based cloning; however, their application in plant systems has been limited, due in part to the scarcity of plant EST databases.
Predicting the presence or absence of a particular gene allele (e.g., a disease resistance allele) is one of the most desired qualities in molecular markers. The closer a marker is to a gene allele, the better it serves to tag the desired allele. The ability to include more markers in the soybean genetic map will greatly improve the ability to detect and select for desired traits (e.g., disease resistance). SSR/SSLP markers derived from ESTs offer an opportunity to improve soybean genetic maps. Furthermore, since the sequences that are being mapped are derived from functional sequences, it is possible that an EST marker that maps very close to or on top of a desired phenotypic trait is in fact derived from the gene that encodes that desired trait, thereby permitting and providing a basis for cloning of the genomic locus and expressed allele that imparts that desired trait.
There is a need in the art for improved soybean genetic maps to facilitate the study of disease-resistance genetic loci. There is a need for soybean molecular markers to construct genetic maps with improved resolution, especially in the vicinity of known disease resistance loci. There is a need in the art for soybean molecular markers that are in close proximity to disease-resistance loci in order to facilitate marker assisted selection (MAS), genetic analysis of those genetic loci, and also to facilitate gene discovery and cloning of the gene alleles that impart the disease resistance. The present invention provides compositions and methods that meet these needs and provide other advantages.