The primary objectives of plant breeding are to select an optimal pair of parents to make a cross and then to select one or more superior progeny resulting from that cross. In hybrid crops, a third objective is to identify a tester to make up high performing hybrid seed. Traditional plant breeding has relied on visual observations and performance data on the plants or lines in order to make selections to meet one of the aforementioned objectives.
In recent years, molecular breeding has demonstrated promise for improving the breeding process and enhancing the rate of genetic gain. In molecular breeding, molecular markers provide a basis for parental, progeny or tester selections; this process may be used in conjunction with phenotype-based selection as well. Inclusion of genetic markers in breeding programs has accelerated the identification and accumulation of valuable traits into germplasm pools compared to that achieved based only on phenotypic data. Herein, “germplasm” includes breeding germplasm, breeding populations, collection of elite inbred lines, populations of random mating individuals, and biparental crosses.
For molecular breeding to be effective, the differences in marker genotypes must be heritably associated to one or more phenotypic or performance traits. These associations are established by correlating the marker genotypes to lines or populations segregating for one or more traits. Genetic marker alleles (an “allele” is an alternative sequence at a locus) are used to identify plants that contain a desired genotype at one or more loci, and that are expected to transfer the desired genotype, along with a desired phenotype for one or more traits, to their progeny. Markers that are highly correlated with a phenotype are assumed to be genetically linked to the trait, thus the marker can then be used as a basis for selection decisions in lieu of evaluating the trait per se. Markers that are not correlated will be inherited independently of the trait and are not useful for selections, but can be valuable in comparing similarities and/or measuring genetic distances among varieties and lines. Ideally, the marker will represent the actual genomic variation responsible for a trait and will therefore always segregate with the trait, although the correlations can be masked by phenomena such as environmental interactions or epistatic effects.
Initial marker platforms for molecular breeding did not require a priori knowledge of underlying sequence. These markers were based on restriction fragment length polymorphisms (RFLPs). Random or directed DNA probes were used in Southern hybridization protocols to identify target fragments whose size varied depending on the location and distance between a pair of restriction enzyme recognition sites. These differences in size could be correlated to traits in test populations. The DNA probes were then used as markers that could detect the underlying restriction fragment length polymorphisms and in turn be used to predict a correlated trait. Other types of markers have been used that require a priori knowledge of the underlying sequence and include but are not limited to fingerprinting using amplified fragment length polymorphisms (AFLPs) or universal PCR primers (i.e. RICE primers).
In recent years, markers have been developed based on the knowledge of an underlying sequence. For example, microsatellite or simple sequence repeat (SSR) markers rely on PCR and gel electrophoresis to elucidate variation in the length of DNA repeat sequences. The differences in repeat length, as revealed by the markers, can correlate to associated traits if the target repeat is genetically linked to the trait.
However, traditional marker platforms are suboptimal because they are not suited for automation or high throughput techniques. In addition, traditional marker platforms are susceptible to false marker-trait associations wherein the identity of a genotype between two lines may not reflect a common parent but a convergent sequence, which is problematic for tracking specific marker alleles across multiple generations.
Other types of variations useful as traditional markers are single nucleotide polymorphisms (SNPs). These are single base changes which differ between two lines and will segregate with a trait in which they are genetically linked. SNPs can be detected by a variety of commercially available marker technologies. Markers based on SNPs have gained in popularity due to the ease and accuracy of detection, compatibility with information systems and low cost. However, SNP markers are still an indirect tool for querying underlying sequence and a SNP marker is restricted to only detecting two alleles, not the four possible nucleotides that might be found at any given nucleotide position.
Thus, there is a need in the art for methods to quickly and accurately determine direct sequence information from at least one plant genome for the purpose of facilitating plant breeding activities such as line development, germplasm diversity analyses, rare allele mining, purity testing, quality assurance, introgression of specific genomic regions, stacking of genomic regions, prediction of line performance, and prediction of hybrid performance.