An ultimate goal of many plant scientists is to identify and discover the function of each gene in plants. The use of molecular biology techniques allows for the manipulation of genomes directed to this objective. Such a genome project can be arbitrarily divided into three phases. Phase I involves mapping the genome by genetic and physical methods. Phase II involves cloning and sequencing all, or most, of the genes. Phase III involves determining the function of each gene, before or after the sequence of the entire genome or that of the cDNAs is known. For convenience, Phase III can be further divided into two major steps. Step one is to construct an insertional-mutant library, with the goal of disrupting each gene separately. Step one also includes determining the DNA sequence that flanks the inserted plasmid, and the chromosomal location of the inserted plasmid, in each mutant plant. Step two involves the determination of the function of each gene by examining the phenotypic, physiological, or biochemical changes of each mutant line of the saturation gene-disruption library.
Rice is used as an example, in part, because as the major staple food for over two billion people, it is one of the most important food crops in the world. Rice production must be increased by 50% by the year 2030 to feed the projected growth of population. Understanding how rice genes function will help to increase rice yields. Rice is also a convenient model system for studying gene function, because it has a relatively small genome and it was the earliest cereal plant to undergo transformation and regeneration procedures. Moreover, due to synteny of genes with other cereal plants, any information obtained on rice genes will likely be applicable to other important cereal crops, such as maize, wheat, and barley.
After about 10 years of efforts by many scientists, physical mapping of the rice genome was virtually completed several years ago. In April 2000, it was announced by the Monsanto Company (St. Louis, Mo.) that most of the rice genome sequences have been determined. Additional rice genomic sequences were released in April 2002 (Yu et al., “A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica),” Science 296:79–92 (2002); Goff et al., “A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica),” Science 296:92–100 (2002)). Thus, the work in Phases I and II is essentially concluded. Small-scale Phase III work started several years ago, but progress has been slow, because the current methods of generating specific mutant lines are time-consuming and imprecise.
A significant amount of genomic work has been carried out in Arabidopsis, a model system, because of the relatively small genome and short generation time of Arabidopsis. Several partial gene-disruption libraries have already been made. One type of library uses T-DNA to disrupt the gene in the Arabidopsis genome, which includes some 8,000 T-DNA gene-disrupted “tagged” mutants (Feldmann et al., “A Dwarf Mutant of Arabidopsis Generated by T-DNA Insertion Mutagenesis,” Science 243:1351–1354 (1989)). A major disadvantage of T-DNA tagging, and similar approaches, is that one needs as many transformation events as the number of T-tagged mutants. Since transformation of Arabidopsis is efficient, it is now possible to obtain 100,000 T-DNA tagged mutants with brute force (Krysan et al., “T-DNA As an Insertional Mutagen in Arabidopsis,” Plant Cell 11: 2283–2290 (1999)). On the other hand, transformation of rice is much less efficient. It is not yet practical to obtain anywhere close to 200,000 T-DNA tagged rice mutants.
A second type of library makes use of an endogenous transposon, such as Mu in maize (Bensen et al., “Cloning and Characterization of the Maize An1 Gene,” Plant Cell 7: 75–84 (1995)); tos17 Transposon in Rice (Hirochika et al., “Retrotransposons of Rice Involved in Mutations Induced by Tissue Culture,” Proc. Natl. Acad. Sci. USA 93:7783–7788 (1996)). Although a large number of insertional mutants can be obtained, a major disadvantage of this method is that it is difficult to get desired revertants, especially if a large number of insertions are present in each plant.
A third type of library involves transferring mobile genomic sequences, known as transposable elements, or transposons, from one plant to other plants. Transposable elements are either autonomous or nonautonomous. Autonomous elements carry the gene(s) encoding for the enzymes required for transposition, thus autonomous elements have the ability to excise and transpose. Nonautonomous elements do not transpose spontaneously. They become mobile only when an autonomous member of the same family is present elsewhere in the genome. One well-characterized plant transposon is the maize Activator (“Ac”) and Dissociation (“Ds”) family of transposable elements. The family is comprised of the autonomous element Ac, and the nonautonomous Ds element. Ds elements are not capable of autonomous transposition, but can be trans-activated to transpose by Ac (Hehl et al., “Induced Transposition of Ds by a Stable Ac in Crosses of Transgenic Tobacco Plants,” Mol. Gen. Genet. 217:53–59 (1989)). Thus, transposable elements, such as Ac/Ds of maize, can be transferred to other plants to generate a relatively small number of anchor plants (such as 500), and then to produce a much larger number of secondary insertional-mutant plant lines. The major advantage to this method is that one needs a relatively small number of anchor plant lines (such as several thousand) to generate a large population of secondary mutant plant lines (such as 200,000) after transposition (Hehl et al., “Induced Transposition of Ds by a Stable Ac in Crosses of Transgenic Tobacco Plants,” Mol. Gen. Genet. 217:53–59 (1989); Bancroft et al., “Transposition Pattern of the Maize Element Ds in Arabidopsis Thaliana,” Genetics 134:1211–1229 (1993)).
From published reports, it is known that over 70% of the insertional mutants in Arabidopsis have no readily visible phenotype, which makes identification of transposition sites difficult, if not impossible. The Ac/Ds system was improved by using enhancer- and gene-trap plasmids (Sundaresan et al., “Patterns of Gene Action in Plant Development Revealed by Enhancer Trap and Gene Trap Transposable Elements,” Genes &Develop. 9:1797–1810 (1995)), which allow disrupted genes with no phenotype to be detected by expression of a reporter gene (such as Gus). As of 1999, this type of library includes less than 15,000 Ac/Ds-tagged rice plant lines (Chin et al., “Molecular Analysis of Rice Plants Harboring An Ac/Ds Transposable Element-Mediated Gene Trapping System,” Plant J. 19: 615–623 (1999)). Therefore, many more additional plant lines both in rice and Arabidopsis are still needed to produce a saturation library. One advantage of this type of insertional-mutant library is that it includes both gene tagging and knockout features. Another advantage of Ac/Ds-tagged plants is that revertants can be obtained relatively easily.
Another type of insertion-mutant library is known as the activation-trap library. Activation tagging uses T-DNA vectors that contain multimerized transcriptional enhancer from cauliflower mosaic virus (“CaMV”) 35S genes. After insertion in the plant genome, the enhancers in the T-DNA can cause transcriptional activation of nearby genes in the plant genome. Thus, use of the activation trap future results in over-expression of the nearby genes which may result in changes in phenotype (Weigel et al., “Activation Tagging in Arabidopsis,” Plant Physiology 112:1003–1013 (2000)).
Each of the three types of libraries: the gene-trap, enhancer-trap, and activation-trap, also suffers from the same problem as T-DNA tagged plants, or use of an endogenous transposon to produce gene-disruption libraries, i.e., all of these libraries are constructed by a random “shotgun”-type approach. In any random approach, large amounts of time are wasted analyzing a high percentage of redundant plant lines. The general practice by most scientists is to generate and then analyze a tenfold excess of randomly generated plant lines to cover approximately 99% of the genome by calculation. For example, to achieve a 99% probability of tagging all the genes in the rice genome, 400,000 tagged plant lines are needed. The laboratory of Shimamoto obtained around 500 tagged mutant rice lines in 1993 (Shimamoto et al., “Trans-Activation and Stable Integration of the Maize Transposable Element Ds Cotransfected with the Ac Transposase Gene in Transgenic Rice Plants,” Mol. Gen. Genet. 239: 354–360 (1993)), and close to 8,000 by 1999 (Enoki et al., “Ac as a Tool for the Functional Genomics of Rice,” The Plant J. 19:605–613 (1999)). There are at least three publications which show that after Ac/Ds-containing plasmids are integrated into the rice genome, transposition does occur and that the frequency of transposition in rice is relatively high, in the range of 3–15% (Shimamoto et al., “Trans-Activation and Stable Integration of the Maize Transposable Element Ds Cotransfected with the Ac Transposase Gene in Transgenic Rice Plants,” Mol. Gen. Genet. 239: 354–360 (1993); Enoki et al., “Ac as a Tool for the Functional Genomics of Rice,” Plant J. 19:605–613 (1999); Chin et al., “Molecular Analysis of Rice Plants Harboring An Ac/Ds Transposable Element-mediated Gene Trapping System,” Plant J. 19: 615–623 (1999)).
Production of still another type of insertion-mutant has been reported by applying a poly A-trap approach in differentiated mouse embryonic stem cells and mouse embryos (Salminen et al., “Efficient Poly A Trap Approach Allows the Capture of Genes Specifically Active in Differentiated Embryonic Stem Cell and in Mouse Embryos,” Develop. Dynamics 212:328–333 (1998)). In this approach, only expressed genes are trapped. This is because special vectors have been constructed that allow the trapping of only expressed genes in mouse embryonic stem (ES) cells. One plasmid included the neomycin phosphotransferase gene (neo) for selection. However, the polyadenylation (poly A) sequence of neo was not present. Thus, neo-resistant ES cells were obtained only when the plasmid was integrated next to a poly A sequence of an endogenous gene. These transformed cells then represent those that trapped the expressed genes. In the next step, selected ES cell clones were introduced into mouse embryos by microinjection or aggregation of cells and chimeric mice were generated. The advantage of producing a poly A trap library is that one need not generate a very large number (such as one million) of insertion mutants since the number of genes in most eukaryotes are in the range of 25,000 to 80,000. The disadvantage of this approach is that the efficiency of poly A trapping is relatively low, and not all expressed genes can be trapped. Moreover, the mutant library produced by this method is not indexed, and it is not known what percent of the expressed genes are actually trapped.
Even though some methods are already available for studying the functions of individual genes in a genome, the existing methods are very time-consuming and labor intensive because of the large number (>200,00) of mutant lines that need to be screened following gene disruption. It has been estimated that the amount of work needed for Phase III research is on the order of ten times the combined efforts of Phase I and II work.
Within Phase III research, two major steps are included. Step one involves generating a well-spaced saturation gene-disrupted library, followed by determination of the flanking DNA sequences of the transposed genes. Step two involves the examination for phenotypic, physiological, or biochemical changes in all insertion mutant lines. Using the current methods, the time and effort needed for Step two analysis of a saturation gene-disruption plant library are far greater than those required for Step one. This is because the identification of the function of specific genes, for example, 25,000 genes in Arabidopsis plant lines, may require the generation and then analysis of 250,000 randomly produced plant lines due to redundancy. For each plant line, at least five plants are usually needed. Thus, improvements in the current methods are needed to make both Steps of Phase III work faster and less labor-intensive. What is needed to improve Step one is a method which systematically tags all genes in a given plant genome to produce an indexed insertion-mutant library. This, in turn, can eliminate the need for extreme redundancy in screening for phenotypic, physiological, or biochemical changes in Step two, thereby drastically reducing the time and labor required for subsequent gene identification. The present invention is directed to improving Step one so that the work involved in Step two can be greatly reduced, thereby overcoming these and other deficiencies in the current art.