The present invention relates generally to the field of molecular biology and viral genetics. Specifically, the present invention relates to a method for using viral vectors to house populations of nucleic acid sequence variants and to select one or more sequences that exhibit the desired traits using plant hosts.
Nature has developed many strategies for generating genetic diversity over billions of years of evolution. These strategies include random mutagenesis, recombination and selection. Many methods are now available in the laboratory to mimic these processes in order to efficiently generate beneficial mutations and select molecules with desired traits. For recent reviews, see Minshull et al., Curr. Opin. Chem. Biol. 3:284-290 (1999); Giver et al., ibid 2:335-338 (1998); and Patten et al., ibid 8:724-733 (1997).
The generation of genetic diversity through in vitro recombination methods is often referred to as xe2x80x9cmolecular breedingxe2x80x9d or xe2x80x9cdirected evolutionxe2x80x9d (Minshull et al., supra and Kuchner et al., Trends Biotechnol. 15:523-530 (1997)). DNA shuffling is a method for generating, in vitro, recombinant genes from a set of parent genes (Stemmer, Nature, 370:389-391 (1994); Stemmer, Proc. Natl. Acad. Sci. USA 91:10747-10751 (1994); Crameri et al., Nat. Biotechnol. 14:315-319 (1996); Crameri et al., Nature Medicine 2:100-103 (1996); Stemmer, Sexual PCR and Assembly PCR in The Encyclopedia of molecular Biology, VCH Publishers, New York, pp. 447-457 (1996); Crameri et al., Nat. Biotechnol. 15:436-438 (1997); Zhang et al., Proc. Natl. Acad. Sci. USA 94:4504-4509 (1997); Crameri et al., Nature 391:288-291 (1998); Christians et al., Nat. Biotechnol. 17:259-264 (1999), U.S. Pat. Nos. 5,830,721, 5,811,238, 5,830,721, 5,605,793, 5,834,252, and 5,837,458; and PCT publications WO 98/13487, WO98/27230, and WO 98/31837). Typically, the parental genes are randomly fragmented by Dnase I. The purified fragments are then reassembled by repeated cycles of overlap extension into full-length genes that contain novel combinations of the parental mutations.
Other in vitro recombination methods have also been developed to generate a population of nucleic acid sequences, for example, random priming recombination (RPR) and the staggered extension process (StEP) (Moore et al., Nat. Biotechnol. 14:458-467 (1996); Zhao et al., Proc. Natl. Acad. Sci. USA 94:7997-8000 (1997); Arnold, Acc. Chem. Res. 31:125-131 (1998); Shao et al., Nucleic Acids Res. 26:681-683 (1998); Zhao et al., Nat. Biotechnol. 16:258-261 (1998); Arnold, Proc. Natl. Acad. Sci. USA 95:2035-2036 (1998); Giver et al., Proc. Natl. Acad. Sci. USA 95:12809-12813 (1998); and Zhao et al., Protein Eng. 12:47-53 (1999)). In the RPR method, short random primers are annealed to the template and extended by polymerase. The resulting fragments, the length of which can be controlled by altering the conditions of the annealing and extension reaction, are then separated from the initial template and unextended primers. These fragments are assembled into full length genes by cycles of overlap extension. The StEP method uses template switching during synthesis to form the desired chimeric genes. The templates are mixed with one or more primers and subjected to repeated cycles of denaturation and short annealing/extension steps. Because the growing fragments can anneal to different templates, the resulting full length sequences contain sequence information from different parents.
DNA shuffling and other in vitro recombination methods have been applied to prokaryotic or cell-base systems to select sequences of desired protein activities. However, the ability to introduce sequence variants throughout an organism in a rapid and high throughput manner has not been demonstrated. Virus vectors are ideal for shuttling libraries of sequence variants throughout an organism, such as plants, for selection of optimized functions. No other tool, transient or stable expression methods, can match the ability of viral vectors to develop optimized functions using plant hosts.
Viruses are a unique class of infectious agents whose distinctive features are their simple organization and their mechanism of replication. Their hosts include a wide variety of plants and animals. A complete viral particle, or virion, may be regarded mainly as a block of genetic material (either DNA or RNA) capable of autonomous replication, surrounded by a protein coat and sometimes by an additional membranous envelope. The coat protects the virus from the environment and serves as a vehicle for transmission from one host cell to another.
Foreign genes can be expressed in plant hosts either by permanent insertion into the genome or by transient expression using virus-based vectors. Each approach has its own distinct advantages. Transformation for permanent expression needs to be done only once, whereas each generation of plants needs to be inoculated with the transient expression vector. Virus-based expression systems, in which the foreign mRNA is greatly amplified by virus replication, can produce very high levels of proteins in leaves and other tissues. Viral vector-produced protein can also be directed to specific subcellular locations, such as endomembrane, cytosol, or organelles, or it can be attached to macromolecules, such as virions, which aids purification of the protein. For the production of some products, including products for the human health industry, plants provide an optimal system because of reduced capital costs and the greater potential for large-scale production compared with microbial or animal systems.
In this invention, we describe the use of viral expression vectors to bear populations of sequence variants. Plant hosts are used to select those sequences with desired properties, which may be further characterized.
The present invention is a method for selecting desired traits in a plant host by the use of viral vectors to express libraries of nucleic acid sequence variants. This in vitro evolution method is used to improve virus-specific, protein-specific, or host-specific functions. Libraries of sequence variants may be generated by in vitro mutagenesis and/or recombination methods, such as chemical treatment, oligonucleotide mediated mutagenesis, PCR mutagenesis, DNA shuffling, random priming recombination (RPR), restriction enzyme fragment induced template switching (REFITS), and the staggered extension process (StEP), among others. Libraries of sequence variants may be random, semi-random or known sequences. In preferred embodiments, RNA viral vectors may be used as the genetic backbones to bear libraries containing variants of nucleic acid sequences and to be applied to plant hosts such that the desired traits in the RNA or protein products can be determined, selected and improved. The template nucleic acid sequences for generating sequence variants may be of viral origin, such as, sequences encoding, coat protein, movement protein, promoter, internal initiation sites, packaging signals, 5xe2x80x2 and 3xe2x80x2 NTRs, or ribosomal sequences, or any other structural and non-structural components of viral nucleic acid sequences. The template nucleic acid sequences for generating sequence variants may also be derived from genes, regulatory sequences, or fragments thereof from bacteria, fungi, plants, animals or other sources. These non-native sequences may be inserted in viral vectors to express foreign proteins, regulate transcription or translation, increase the genetic stability of foreign sequences in viral vectors, etc.
After a plant host is infected with a library containing populations of sequence variants, one or more desired traits are screened and selected. The desired traits may include biochemical or phenotypic traits. Phenotypic traits may include, but not limited to, host range, viral infectivity, tolerance to herbicides, tolerance to extremes of heat or cold, drought, salinity or osmotic stress; resistance to pests (insects, nematodes or arachnids) or diseases (fungal, bacterial or viral), male or female sterility, dwarfness; early maturity, improved yield, vigor, heterosis, nutritional qualities, flavor or processing properties, and others. Biochemical traits may be related to, for example, promoter activities, transcription activities, translational activities, regulatory activities, movement activities (local and systemic), cellular signaling activities, extraction/purification properties, etc. The screening of sequence libraries is typically followed by rescue of the viruses from populations conferring desired traits. In some embodiments, re-screening of sub-libraries may become necessary. In other embodiments, sequences of the viral nucleic acids conferring desired traits may be determined and compared with the template sequences.
The present invention relates to a method for using viral vectors to bear populations of sequence variants and using plant hosts to select the sequences that exhibit the desired traits.
I. Construction of Suitable Viral Vectors to House Libraries of Sequence Variants
Viral vectors into which libraries of sequence variants are inserted may be constructed using a variety of methods known in the art. In preferred embodiments of the instant invention, the viral vectors used to bear libraries of sequence variants are derived from the RNA plant viruses. A variety of plant virus families may be used, such as Bromoviridae, Bunyaviridae, Comoviridae, Geminiviridae, Potyviridae, and Tombusviridae, among others. Within the plant virus families, various genera of viruses may be suitable for the instant invention, such as alfamovirus, ilarvirus, bromovirus, cucumovirus, tospovirus, carlavirus, caulimovirus, closterovirus, comovirus, nepovirus, dianthovirus, furovirus, hordeivirus, luteovirus, necrovirus, potexvirus, potyvirus, rymovirus, bymovirus, oryzavirus, sobemovirus, tobamovirus, tobravirus, carmovirus, tombusvirus, tymovirus, umbravirusa, and among others.
Within the genera of plant viruses, many species are particular preferred. They include alfalfa mosaic virus, tobacco streak virus, brome mosaic virus, broad bean mottle virus, cowpea chlorotic mottle virus, cucumber mosaic virus, tomato spotted wilt virus, carnation latent virus, caulflower mosaic virus, beet yellows virus, cowpea mosaic virus, tobacco ringspot virus, carnation ringspot virus, soil-borne wheat mosaic virus, tomato golden mosaic virus, cassava latent virus, barley stripe mosaic virus, barley yellow dwarf virus, tobacco necrosis virus, tobacco etch virus, potato virus X, potato virus Y, rice necrosis virus, ryegrass mosaic virus, barley yellow mosaic virus, rice ragged stunt virus, Southern bean mosaic virus, tobacco mosaic virus, ribgrass mosaic virus, cucumber green mottle mosaic virus watermelon strain, oat mosaic virus, tobacco rattle virus, carnation mottle virus, tomato bushy stunt virus, turnip yellow mosaic virus, carrot mottle virus, among others. In addition, RNA satellite viruses, such as tobacco necrosis satellite may also be employed.
A given plant virus may contain either DNA or RNA, which may be either single- or double-stranded. One example of plant viruses containing double-stranded DNA includes, but not limited to, caulimoviruses such as cauliflower mosaic virus (CaMV). Representative plant viruses which contain single-stranded DNA are cassava latent virus, bean golden mosaic virus (BGMV), and chloris striate mosaic virus. Rice dwarf virus and wound tumor virus are examples of double-stranded RNA plant viruses. Single-stranded RNA plant viruses include tobacco mosaic virus (TMV), turnip yellow mosaic virus (TYMV), rice necrosis virus (RNV) and brome mosaic virus (BMV). The single-stranded RNA viruses can be further divided into plus sense (or positive-stranded), minus sense (or negative-stranded), or ambisense viruses. The genomic RNA of a plus sense RNA virus is messenger sense, which makes the naked RNA infectious. Many plant viruses belong to the family of plus sense RNA viruses. They include, for example, TMV, BMV, and others. RNA plant viruses typically encode several common proteins, such as replicase/polymerase proteins essential for viral replication and MRNA synthesis, coat proteins providing protective shells for the extracellular passage, and other proteins required for the cell-to-cell movement, systemic infection and self-assembly of viruses. For general information concerning plant viruses, see Matthews, Plant Virology, 3rd Ed., Academic Press, San Diego (1991).
Selected groups of suitable plant viruses are characterized below. However, the invention should not be construed as limited to using these particular viruses, but rather the method of the present invention is contemplated to include all plant viruses at a minimum.
Tobacco Mosaic virus (TMV) is a member of the tobamoviruses. The TMV virion is a tubular filament, and comprises coat protein sub-units arranged in a single right-handed helix with the single-stranded RNA intercalated between the turns of the helix. TMV infects tobacco as well as other plants. TMV is transmitted mechanically and may remain infective for a year or more in soil or dried leaf tissue.
The TMV virions may be inactivated by subjection to an environment with a pH of less than 3 or greater than 8, or by formaldehyde or iodine. Preparations of TMV may be obtained from plant tissues by (NH4)2SO4 precipitation, followed by differential centrifugation.
Tobacco mosaic virus (TMV) is a positive-stranded ssRNA virus whose genome is 6395 nucleotides long and is capped at the 5xe2x80x2-end but not polyadenylated. The genomic RNA can serve as mRNA for protein of a molecular weight of about 130,000 (130 K) and another produced by read-through of molecular weight about 180,000 (180 K). However, it cannot function as a messenger for the synthesis of coat protein. Other genes are expressed during infection by the formation of monocistronic, 3xe2x80x2-coterminal subgenomic mRNAs, including one (LMC) encoding the 17.5 K coat protein and another (I2) encoding a 30K protein. The 30K protein has been detected in infected protoplasts as described in Miller, J., Virology 132:71 (1984), and it is involved in the cell-to-cell transport of the virus in an infected plant as described by Deom et al., Science 237:389 (1987). The functions of the two large proteins are unknown, however, they are thought to function in RNA replication and transcription.
Several double-stranded RNA molecules, including double-stranded RNAs corresponding to the genomic, I2 and LMC RNAs, have been detected in plant tissues infected with TMV. These RNA molecules are presumably intermediates in genome replication and/or mRNA synthesis processes which appear to occur by different mechanisms.
TMV assembly apparently occurs in plant cell cytoplasm, although it has been suggested that some TMV assembly may occur in chloroplasts since transcripts of ctDNA have been detected in purified TMV virions. Initiation of TMV assembly occurs by interaction between ring-shaped aggregates (xe2x80x9cdiscsxe2x80x9d) of coat protein (each disc consisting of two layers of 17 subunits) and a unique internal nucleation site in the RNA; a hairpin region about 900 nucleotides from the 3xe2x80x2-end in the common strain of TMV. Any RNA, including subgenomic RNAs containing this site, may be packaged into virions. The discs apparently assume a helical form on interaction with the RNA, and assembly (elongation) then proceeds in both directions (but much more rapidly in the 3xe2x80x2- to 5xe2x80x2- direction from the nucleation site).
Another member of the Tobamoviruses, the Cucumber Green Mottle Mosaic virus watermelon strain (CGMMV-W) is related to the cucumber virus (Nozu et al., Virology 45:577 (1971)). The coat protein of CGMMV-W interacts with RNA of both TMV and CGMMV to assemble viral particles in vitro (Kurisu et al., Virology 70:214 (1976)).
Several strains of the tobamovirus group are divided into two subgroups, on the basis of the location of the origin of assembly. Subgroup I, which includes the vulgare, OM, and tomato strain, has an origin of assembly about 800-1000 nucleotides from the 3xe2x80x2-end of the RNA genome, and outside the coat protein cistron (Lebeurier et al., Proc. Natl. Acad. Sci. USA 74:149 (1977); and Fukuda et al., Virology 101:493 (1980)). Subgroup II, which includes CGMMV-W and cowpea strain (Cc) has an origin of assembly about 300-500 nucleotides from the 3xe2x80x2-end of the RNA genome and within the coat protein cistron. The coat protein cistron of CGMMV-W is located at nucleotides 176-661 from the 3xe2x80x2-end. The 3xe2x80x2 noncoding region is 175 nucleotides long. The origin of assembly is positioned within the coat protein cistron (Meshi et al., Virology 127:54 (1983)).
Brome Mosaic virus (BMV) is a member of a group of tripartite, single-stranded, RNA-containing plant viruses commonly referred to as the bromoviruses. Each member of the bromoviruses infects a narrow range of plants. Mechanical transmission of bromoviruses occurs readily, and some members are transmitted by beetles. In addition to BMV, other bromoviruses include broad bean mottle virus and cowpea chlorotic mottle virus.
Typically, a bromovirus virion is icosahedral, with a diameter of about 26 xcexcm, containing a single species of coat protein. The bromovirus genome has three molecules of linear, positive-sense, single-stranded RNA, and the coat protein mRNA is also encapsidated. The RNAs each have a capped 5xe2x80x2-end, and a tRNA-like structure (which accepts tyrosine) at the 3xe2x80x2-end. Virus assembly occurs in the cytoplasm. The complete nucleotide sequence of BMV has been identified and characterized as described by Ahlquist et al., J. Mol. Biol. 153:23 (1981).
Rice Necrosis virus is a member of the Potato Virus Y Group or Potyviruses. The Rice Necrosis virion is a flexuous filament comprising one type of coat protein (molecular weight about 32,000 to about 36,000) and one molecule of linear positive-sense single-stranded RNA. The Rice Necrosis virus is transmitted by Polymyxa oraminis (a eukaryotic intracellular parasite found in plants, algae and fungi).
Geminiviruses are a group of small, single-stranded DNA-containing plant viruses with virions of unique morphology. Each virion consists of a pair of isometric particles (incomplete icosahedral), composed of a single type of protein (with a molecular weight of about 2.7-3.4xc3x97104). Each geminivirus virion contains one molecule of circular, positive-sense, single-stranded DNA. In some geminiviruses (i.e., Cassava latent virus and bean golden mosaic virus) the genome appears to be bipartite, containing two single-stranded DNA molecules.
Potyviruses are a group of plant viruses which produce polyprotein. A particularly preferred potyvirus is tobacco etch virus (TEV). TEV is a well characterized potyvirus and contains a positive-strand RNA genome of 9.5 kilobases encoding for a single, large polyprotein that is processed by three virus-specific proteinases. The nuclear inclusion protein xe2x80x9caxe2x80x9d proteinase is involved in the maturation of several replication-associated proteins and capsid protein. The helper component-proteinase (HC-Pro) and 35-kDa proteinase both catalyze cleavage only at their respective C-termini. The proteolytic domain in each of these proteins is located near the C-terminus. The 35-kDa proteinase and HC-Pro derive from the N-terminal region of the TEV polyprotein.
The selection of the genetic backbone for the viral vectors of the instant invention may depend on the plant host used. The plant host may be a monocotyledonous or dicotyledonous plant, plant tissue, or plant cell. Typically, plants of commercial interest, such as food crops, seed crops, oil crops, ornamental crops and forestry crops are preferred. For example, wheat, rice, corn, potato, barley, tobacco, soybean canola, maize, oilseed rape, lilies, grasses, orchids, irises, onions, palms, tomato, the legumes, or Arabidopsis, can be used as a plant host. Host plants may also include those readily infected by an infectious virus, such as Nicotiana, preferably, Nicotiana benthamiana, or Nicotiana clevelandii. 
One feature of the present invention is the use of plant viral nucleic acids which comprise one or more non-native nucleic acid sequences capable of being transcribed in a plant host. These nucleic acid sequences may be native nucleic acid sequences that occur in a host plant. Preferably, these nucleic acid sequences are non-native nucleic acid sequences that do not normally occur in a host plant. For example, the plant viral vectors may contain sequences from more than one virus, including viruses from more than one taxonomic group. The plant viral nucleic acids may also contain sequences from non-viral sources, such as foreign genes, regulatory sequences, fragments thereof from bacteria, fungi, plants, animals or other sources. These foreign sequences may encode commercially useful proteins, polypeptides, or fusion products thereof, such as enzymes, antibodies, hormones, pharmaceuticals, vaccines, pigments, antimicrobial polypeptides, and the like. Or they may be sequences that regulate the transcription or translation of viral nucleic acids, package viral nucleic acid, and facilitate systemic infection in the host, among others.
In some embodiments of the instant invention, the plant viral vectors may comprise one or more additional native or non-native subgenomic promoters which are capable of transcribing or expressing adjacent nucleic acid sequences in the plant host. These non-native subgenomic promoters are inserted into the plant viral nucleic acids without destroying the biological function of the plant viral nucleic acids using known methods in the art. For example, the CaMV promoter can be used when plant cells are to be transfected. The subgenomic promoters are capable of functioning in the specific host plant. For example, if the host is tobacco, TMV, tomato mosaic virus, or other viruses containing subgenomic promoter may be utilized. The inserted subgenomic promoters should be compatible with the TMV nucleic acid and capable of directing transcription or expression of adjacent nucleic acid sequences in tobacco. It is specifically contemplated that two or more heterologous non-native subgenomic promoters may be used. The non-native nucleic acid sequences may be transcribed or expressed in the host plant under the control of the subgenomic promoter to produce the products of the nucleic acids of interest.
In some embodiments of the instant invention, the recombinant plant viral nucleic acids may be further modified by conventional techniques to delete all or part of the native coat protein coding sequence or put the native coat protein coding sequence under the control of a non-native plant viral subgenomic promoter. If it is deleted or otherwise inactivated, a non-native coat protein coding sequence is inserted under control of one of the non-native subgenomic promoters, or optionally under control of the native coat protein gene subgenomic promoter. Thus, the recombinant plant viral nucleic acid contains a coat protein coding sequence, which may be native or a nonnative coat protein coding sequence, under control of one of the native or non-native subgenomic promoters. The native or non-native coat protein gene may be utilized in the recombinant plant viral nucleic acid. The non-native coat protein, as is the case for the native coat protein, may be capable of encapsidating the recombinant plant viral nucleic acid and providing for systemic spread of the recombinant plant viral nucleic acid in the host plant.
In some embodiments of the instant invention, recombinant plant viral vectors are constructed to express a fusion between a plant viral coat protein and the foreign genes or polypeptides of interest. Such a recombinant plant virus provides for high level expression of a nucleic acid of interest. The location(s) where the viral coat protein is joined to the amino acid product of the nucleic acid of interest may be referred to as the fusion joint. A given product of such a construct may have one or more fusion joints. The fusion joint may be located at the carboxyl terminus of the viral coat protein or the fusion joint may be located at the amino terminus of the coat protein portion of the construct. In instances where the nucleic acid of interest is located internal with respect to the 5xe2x80x2 and 3xe2x80x2 residues of the nucleic acid sequence encoding for the viral coat protein, there are two fusion joints. That is, the nucleic acid of interest may be located 5xe2x80x2, 3xe2x80x2, upstream, downstream or within the coat protein. In some embodiments of such recombinant plant viruses, a xe2x80x9cleakyxe2x80x9d start or stop codon may occur at a fusion joint which sometimes does not result in translational termination.
In some embodiments of the instant invention, nucleic sequences encoding reporter protein(s) or antibiotic/herbicide resistance gene(s) may be constructed as carrier protein(s) for the polypeptides of interest, which may facilitate the detection of polypeptides of interest. For example, green fluorescent protein (GFP) may be simultaneously expressed with polypeptides of interest. In another example, a reporter gene, xcex2-glucuronidase (GUS) may be utilized. In another example, a drug resistance marker, such as a gene whose expression results in kanamycin resistance, may be used.
Since the RNA genome is typically the infective agent, the cDNA is positioned adjacent a suitable promoter so that the RNA is produced in the production cell. The RNA is capped using conventional techniques, if the capped RNA is the infective agent. In addition, the capped RNA can be packaged in vitro with added coat protein from TMV to make assembled virions. These assembled virions can then be used to inoculate plants or plant tissues. Alternatively, an uncapped RNA may also be employed in the embodiments of the present invention. Contrary to the practiced art in scientific literature and in issued patent (Ahlquist et al., U.S. Pat. No. 5,466,788), uncapped transcripts for virus expression vectors are infective on both plants and in plant cells. Capping is not a prerequisite for establishing an infection of a virus expression vector in plants, although capping increases the efficiency of infection. In addition, nucleotides may be added between the transcription start site of the promoter and the start of the cDNA of a viral nucleic acid to construct an infectious viral vector. One or more nucleotides may be added. In some embodiments of the present invention, the inserted nucleotide sequence may contain a G at the 5xe2x80x2-end. Alternatively, the inserted nucleotide sequence may be GNN, GTN, or their multiples, (GNN)x or (GTN)x.
In some embodiments of the instant invention, more than one nucleic acid is prepared for a multipartite viral vector construct. In this case, each nucleic acid may require its own origin of assembly. Each nucleic acid could be prepared to contain a subgenomic promoter and a non-native nucleic acid. Alternatively, the insertion of a non-native nucleic acid into the nucleic acid of a monopartite virus may result in the creation of two nucleic acids (i.e., the nucleic acid necessary for the creation of a bipartite viral vector). This would be advantageous when it is desirable to keep the replication and transcription or expression of the nucleic acid of interest separate from the replication and translation of some of the coding sequences of the native nucleic acid.
The recombinant plant viral nucleic acid may be prepared by cloning a viral nucleic acid. If the viral nucleic acid is DNA, it can be cloned directly into a suitable vector using conventional techniques. One technique is to attach an origin of replication to the viral DNA which is compatible with the cell to be transfected. In this manner, DNA copies of the chimeric nucleotide sequence are produced in the transfected cell. If the viral nucleic acid is RNA, a DNA copy of the viral nucleic acid is first prepared by well-known procedures. For example, the viral RNA is transcribed into DNA using reverse transcriptase to produce subgenomic DNA pieces, and a double-stranded DNA may be produced using DNA polymerases. The cDNA is then cloned into appropriate vectors and cloned into a cell to be transfected. In some instances, cDNA is first attached to a promoter which is compatible with the production cell. The recombinant plant viral nucleic acid can then be cloned into any suitable vector which is compatible with the production cell. Alternatively, the recombinant plant viral nucleic acid is inserted in a vector adjacent a promoter which is compatible with the production cell. In some embodiments, the cDNA ligated vector may be directly transcribed into infectious RNA in vitro and inoculated onto the plant host. The cDNA pieces are mapped and combined in proper sequence to produce a full-length DNA copy of the viral RNA genome, if necessary.
In some embodiments of the instant invention, increased representation of gene sequences in virus expression libraries may also be achieved by bypassing the genetic bottleneck of propagation in bacterial cells. For example, in some embodiments of the instant invention, cell-free methods may be used to assemble sequence libraries or individual arrayed sequences into virus expression vectors and reconstruct an infectious virus, such that the final ligation product can be transcribed and the resulting RNA can be used for plant, plant tissue or plant cell inoculation/infection. A more detailed discussion is presented in a co-pending and co-owned U.S. patent application No. 09/359,303 (Padgett et al., filed herewith, incorporated herein by reference).
Those skilled in the art will understand that these embodiments are representative only of many constructs suitable for housing libraries of sequence variants. All such constructs are contemplated and intended to be within the scope of the present invention. The invention is not intended to be limited to any particular viral constructs but specifically contemplates using all operable constructs. A person skilled in the art will be able to construct the plant viral nucleic acids based on molecular biology techniques well known in the art. Suitable techniques have been described in Sambrook et al. (2nd ed.), Cold Spring Harbor Laboratory, Cold Spring Harbor (1989); Methods in Enzymol. (Vols. 68, 100, 101, 118, and 152-155) (1979, 1983, 1986 and 1987); and DNA Cloning, D. M. Clover, Ed., IRL Press, Oxford (1985); Walkey, Applied Plant Virology, Chapman and Hall (1991); Matthews, Plant Virology, 3rd Ed., Academic Press, San Diego (1991); Turpen et al., J. of Virological Methods, 42:227-240 (1993); U.S. Pat. Nos. 4,885,248, 5,173,410, 5,316,931, 5,466,788, 5,491,076, 5,500,360, 5,589,367, 5,602,242, 5,627,060, 5,811,653, 5,866,785, 5,889,190, and 5,589,367, U.S. patent application No. 08/324,003. Now U.S. Pat. No 5,977,348. Nucleic acid manipulations and enzyme treatments are carried out in accordance with manufacturers"" recommended procedures in making such constructs.
II. Generating Libraries of Sequence Variants
One or more template sequences may be used to generate libraries of nucleic acid sequence variants via in vitro mutagenesis, recombination or a combination thereof. In some embodiments of the invention, the template sequences may be derived from viral elements. For example, if a plant virus is used as the genetic backbone for the recombinant viral nucleic acids, a nucleic acid sequence encoding elements (or parts thereof) such as the coat protein, movement protein, promoter sequences, internal initiation sites, packaging signals, 5xe2x80x2 and 3xe2x80x2 NTRs, or ribosomal sequences among others, may be selected as a template for generating a library containing variants of such nucleic acid sequence. The entire plant virus genomes may also be subjected to in vitro mutagenesis or recombination so to improve plant virus vector performance. In preferred embodiments, elements of the open reading frame (ORF) of RNA plant viruses is the starting point for sequence variation. Functions within the ORF include the movement protein (MP), the virus origin of virion assembly, the subgenomic promoter used for coat protein synthesis, among others.
In some embodiments of the invention, genes, regulatory sequences, or fragments thereof from prokaryotic and eukaryotic sources, such as bacteria, fungi, plants, animals, animal viruses, among others may serve as template sequences for generating sequence variants. For example, sequences regulating the transcription and translation of commercially useful proteins, polypeptides, or fusion products thereof, such as enzymes, antibodies, hormones, pharmaceuticals, vaccines, pigments, antimicrobial polypeptides, and the like may be used as templates to generate libraries of sequence variants.
The template nucleic acid sequence may be of various lengths. Preferably, the size of template nucleic acid sequence is from about 1 to 100,000 base pairs (bp), e.g. from about 2 to 50,000, from about 2 to 10,000, from about 2 to 5,000, from about 5 to 5,000, from about 10 to 2,000, etc. The number of sequence variants in a library may also vary depending on the method used. Typically, the number of sequence variants in a library is from about 2 to 100,000,000, e.g., from about 4 to 10,000,000, from about 16 to 1,000,000, from about 64 to 500,000, from about 64 to 100,000, from about 64 to 50,000, etc. In some instance, the number of sequence variants in a library may be unknown.
One skilled in the art will appreciate that there are many ways to generate sequence variants. A population of nucleic acid sequence variants may be found in nature. For example, a genomic library, a cDNA library, a pool of RNAs derived from bacteria, fungi, plants, or animals including humans, may be constructed. A more detailed discussion of generating such library is presented in a co-pending and co-owned U.S. patent application No. 09/359,300 (Kumagai et al., filed herewith, incorporated herein by reference). In some instances, natural sequence variations may consist of different alleles of the same gene or the same gene from different related species. Alternatively, they may be related nucleic acid sequences found within one species, for example, the immunoglobulin genes. In addition, the natural variations in plant and animal viral populations may also be the templates for generating sequence libraries.
In preferred embodiments, the sequence variants may be generated using in vitro mutagenesis methods, including, but not limited to, chemical treatment, oligonucleotide-mediated mutagenesis, PCR mutagenesis, and the like. The sequence variants may also be generated using in vitro recombination methods, including, but not limited to, DNA shuffling, random priming recombination (RPR), restriction enzyme fragment induced template switching (REFITS), and the staggered extension process (StEP), and other in vitro recombination methods. The sequence variants may also be generated using a combination of the methods mentioned above. The sequence populations may be random or selectively varied. Any source of nucleic acid can be utilized as the starting nucleic acid. Thus the process may employ DNA or RNA including messenger RNA, which DNA or RNA may be single or double stranded. In addition, a DNA-RNA hybrid which contains one strand of each may be utilized.
A. In vitro mutagenesis.
The nucleic acid sequence can be altered by chemical mutagenesis. Chemical mutagens include, for example, sodium bisulfite, nitrous acid, hydroxylamine, hydrazine or formic acid. Other agents which are analogues of nucleotide precursors include nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. In some embodiments, these agents may be added to the PCR reaction in place of the nucleotide precursor thereby mutating the template sequence. Intercalating agents such as proflavine, acriflavine, quinacrine and the like can also be used. Random mutagenesis of the nucleic acid sequence can also be achieved by irradiation with X-rays or ultraviolet light.
In oligonucleotide-directed mutagenesis, a short synthetically mutagenized oligonucleotide incorporating the desired base changes is hybridized to the sequences to be altered (Sambrook, et al., Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989 and Cleland et al., Protein Engineering: Principles and Practice, Wiley-Liss (1996)). The mismatched primer is then extended by polymerase, thereby generating the varied sequence. Individually varied sequences may be mixed and expressed together to select the desired function from such sequence mix. This approach is particularly useful in generating sequence variations that are close to each other.
Error-prone PCR may be employed to create libraries of point mutations (Eckert et al., PCR Methods App. 1:17-24 (1991); Caldwell et al., PCR Methods App. 2:28-33 (1992), Gramm et al., Proc. Natl. Acad. Sci. USA 89:3576-3580 (1992); and Cadwell et al., PCR Methods App. 3:S136-40 (1994); You et al. Protein Eng. 9:77-83 (1994)). This method uses a low fidelity replication to introduce random point mutations at each round of amplification. Repeated cycles of error-prone PCR may lead to accumulation of point mutations. Error prone PCR can be used to mutagenize a mixture of template sequences without knowing their nucleotide composition. Error-prone PCR is particularly suited when regions of mutagenesis are small, typically less than 1,000 base pairs.
Combinatorial cassette mutagenesis (Black et al., Proc. Natl. Acad. Sci. USA 93:3525-3529 (1996) and recursive ensemble mutagenesis (Delagrave et al., Biotechnology 11:1548-1552 (1993) and Arkin et al., Proc. Natl Acad. Sci. USA 89:7811-7815 (1992) may also be used to produce sequence variances. In cassette mutagenesis, a sequence block of a single template is typically replaced by a randomized or partially randomized sequence. Therefore, sequence variants are typically determined by the size of the sequence block and the number of random sequences. The randomized sequences may be derived from synthetically mutagenized oligonucleotides. Typically, the nucleotide compositions of the template sequences are known. In addition, cassette mutagenesis may employ E. coli strain XL1-red (Stratagene, Inc.). This E. coli strain has a high mutation rate from which a population of sequence variants may be derived.
B. In vitro recombination
1. DNA Shuffling
Nucleic acid shuffling is a method for in vitro homologous recombination of pools of nucleic acid sequence variants (U.S. Pat. Nos. 5,811,238, 5,605,793, 5,830,721, 5,834,252, and 5,837,458). This procedure involves random fragmentation of mixtures of related nucleic acid sequences followed by reassembly to yield a population of nucleic acid sequence variants.
The template polynucleotide may be DNA or RNA. It may be of various lengths depending on the size of the gene or nucleic acid fragment to be recombined or reassembled. Preferably the template polynucleotide is from 50 to 10,000 bp. The template polynucleotide should be double-stranded. A double-stranded nucleic acid molecule is typically required to ensure that regions of the resulting single-stranded nucleic acid fragments are complementary to each other and thus can hybridize to form a double-stranded molecule. The template polynucleotide may be obtained by amplification using the PCR reaction. Free primers from the PCR products are typically removed before fragmentation to increase the frequency of crossover. Single-stranded or double-stranded nucleic acid fragments having regions of identity to the template polynucleotide and regions of heterology to the template polynucleotide may be added to the template polynucleotide. Different but related polynucleotide templates may be mixed. Double-stranded nucleic acids having multiple nicks may also be used.
The double-stranded polynucleotide template and any added double- or single-stranded fragments are randomly digested into fragments of from about 5 to 5,000 bp. Preferably the size of the random fragments is from about 10 to 1,000 bp, more preferably the size of the DNA fragments is from about 20 to 500 bp.
The nucleic acid fragment may be digested by a number of different methods. The nucleic acid fragment may be digested with a nuclease, such as Dnase I or Rnase. The nucleic acid may be randomly sheared by the method of sonication or by passage through a tube having a small orifice. The number of different specific nucleic acid fragments in the mixture may be at least about 100, preferably at least about 500, and more preferably at least about 1000. At this step single-stranded or double-stranded nucleic acid fragments, either synthetic or natural, may be added to the random double-stranded nucleic acid fragments in order to increase the heterogeneity of the mixture of nucleic acid fragments. Populations of double-stranded randomly broken nucleic acid fragments may be mixed or combined at this step.
Where insertion of mutations into the template polynucleotide is desired, single-stranded or double-stranded nucleic acid fragments having a region of identity to the template polynucleotide and a region of heterology to the template polynucleotide may be added in a 20 fold excess by weight as compared to the total nucleic acid, more preferably the single-stranded nucleic acid fragments may be added in a 10 fold excess by weight as compared to the total nucleic acid. Where a mixture of different but related template polynucleotides is desired, populations of nucleic acid fragments from each of the templates may be combined at a ratio of less than about 1:100, more preferably the ratio is less than about 1:40. For example, a backcross of the wild-type polynucleotide with a population of mutated polynucleotide may be desired to eliminate silent mutations. In such an example, the ratio of randomly digested wild-type polynucleotide fragments which may be added to the randomly digested mutant polynucleotide fragments is approximately 1:1 to about 100:1, and more preferably from 1:1 to 40:1.
The mixed population of random nucleic acid fragments are denatured to form single-stranded nucleic acid fragments and then reannealed. Only those single-stranded nucleic acid fragments having regions of homology with other single-stranded nucleic acid fragments will reanneal.
The random nucleic acid fragments may be denatured by heating. One skilled in the art could determine the conditions necessary to completely denature the double stranded nucleic acid. Preferably the temperature is from 80 xc2x0 C. to 100xc2x0 C., more preferably the temperature is from 90xc2x0 C. to 96xc2x0 C. Other methods which may be used to denature the nucleic acid fragments include pressure and pH.
The nucleic acid fragments may be reannealed by cooling. Preferably the temperature is from 20xc2x0 C. to 75xc2x0 C., more preferably the temperature is from 40xc2x0 C. to 65xc2x0 C. If a high frequency of crossovers is needed based on an average of only 4 consecutive bases of homology, recombination can be forced by using a low annealing temperature. The degree of renaturation will depend on the degree of homology between the population of single-stranded nucleic acid fragments.
Renaturation can be accelerated by the addition of polyethylene glycol (PEG) or salt. The salt concentration is preferably from 0 mM to 200 mM, more preferably the salt concentration is from 10 mM to 100 mM. The salt may be KCl or NaCl. The concentration of PEG is preferably from 0% to 20%, more preferably from 5% to 10%.
The annealed nucleic acid fragments are next incubated in the presence of a nucleic acid polymerase and dNTP""s (i.e. dATP, dCTP, dGTP and dTTP). The nucleic acid polymerase may be the Klenow fragment, the Taq polymerase, the Vent polymerase, or any other DNA polymerase known in the art.
The approach to be used for the assembly depends on the minimum degree of homology that should still yield crossovers. If the areas of identity are large, Taq polymerase can be used with an annealing temperature of between 45xc2x0 C.-65xc2x0 C. If the areas of identity are small, Klenow polymerase can be used with an annealing temperature of between 20xc2x0 C.-30xc2x0 C. One skilled in the art could vary the temperature of annealing to increase the number of crossovers achieved. The polymerase may be added to the random nucleic acid fragments prior to annealing, simultaneously with annealing or after annealing.
The cycle of denaturation, renaturation and incubation in the presence of polymerase is commonly referred to as shuffling or reassembly of the nucleic acid. This cycle is repeated for a desired number of times. Preferably the cycle is repeated from 2 to 50 times, more preferably the sequence is repeated from 10 to 40 times.
The resulting nucleic acid is a larger double-stranded polynucleotide of from about 50 to about 100,000 bp, preferably the larger polynucleotide is from 500 to 50,000 bp. This larger polynucleotide fragment may contain a number of copies of a nucleic acid fragment having the same size as the template polynucleotide in tandem. This concatemeric fragment is then digested into single copies of the template polynucleotide. The result will be a population of nucleic acid fragments of approximately the same size as the template polynucleotide. The population will be a mixed population where single or double-stranded nucleic acid fragments having an area of identity and an area of heterology have been added to the template polynucleotide prior to shuffling. The use of degenerate oligonucleotides in DNA shuffling to increase library diversity at specific sites may be useful under some circumstances.
The advantages of DNA shuffling are many: (1) sequences can be optimized without first determining the sites within the sequence that require alteration; (2) given proper selection, several generations of the improved sequences can be formed in a time frame unattainable by natural circumstances; and (3) mutations of every sort are randomly dispersed throughout the sequences, allowing a xe2x80x9csaturationxe2x80x9d approach to determine the genetic potential of a given sequence.
2. Random-priming Recombination (RPR)
Another simple and efficient method for in vitro recombination of nucleic acid sequences is random-priming recombination (RPR) (Shao et al., Nucleic Acids Res. 26:681-683 (1998)). In this method, random sequence primers are used to generate a large number of short DNA fragments complementary to different sections of the template sequences. Due to base misincorporation and mispriming, these short DNA fragments also contain a low level of point mutations. The short DNA fragments may prime one another based on homology, and be recombined and reassembled by repeated cycles of denaturation, annealing and further enzyme-catalyzed DNA polymerization to produce a library of full-length sequences. In particular, thermocycling in the presence of thermostable DNA polymerase may be used. Polymerases with different fidelities, including Klenow fragment of E. coli DNA polymerase I, bacteriophage T4 DNA polymerase, T7 Sequenase(copyright) DNA polymerase, the Stoffel fragment of Taq polymerase and Pfu polymerase, may be used for random priming DNA fragment synthesis. The length and concentration of random primer, as well as the time, temperature and other reaction conditions can also be manipulated by those skilled in the art in order to achieve the desired mutagenic rate and recombination frequency.
The RPR method offers several advantages. First, single-stranded polynucleotide templates without an intermediate step of synthesizing the complete second strand may be used in RPR. Potential mutations and/or crossovers may be introduced at the DNA level from single- or double-stranded DNA template by using DNA polymerases, or directly from mRNA by using RNA-dependent DNA polymerases. Second, DNA shuffling requires fragmentation of the double-stranded DNA template by nucleases, e.g. Dnase I. These nucleases need to be removed completely before the fragments can be reassembled into full length sequences. Sequence reassembly is generally easier with the RPR technique, which employs random priming synthesis to obtain the short DNA fragments. Furthermore, since Dnase I hydrolyzes double-stranded DNA preferentially at sites adjacent to pyrimidine nucleotides, its use in template digestion may introduce a sequence bias into the recombination. Third, the synthetic random primers are uniform in their length and lack sequence bias. The sequence heterogeneity allows them to form hybrids with the template DNA stands at many positions, so that, at least in principle, every nucleotide of the template should be copied or mutated at a similar frequency during extension. The random distribution of the short, nascent DNA fragments along the templates and the random distribution of point mutations within each nascent DNA fragment should guarantee the randomness of crossovers and mutations in the full length progeny genes. Fourth, the random-priming DNA synthesis is independent of the length of the DNA template. Fifth, since the template polynucleotide serves solely as the template for the synthesis of nascent, single-stranded DNA, 10-20 times less template DNA is needed as compared to DNA shuffling.
3. Restriction Enzyme Fragment Induced Template Switching (REFITS)
REFITS is a technically simple means of in vitro recombination between homologous DNA sequences. One of the technical challenges in DNA shuffling is reproducible generation of fragments of the appropriate size by Dnase. The Dnase I reaction is very sensitive to variations in template and enzyme concentrations. REFITS provides a different approach to generating fragments that is much easier to reproduce. It is a method to increase the rate of molecular evolution via in vitro homologous recombination of pools of mutant genes by fragmentation of the DNA with restriction enzymes and reassembly of fragments by PCR. The technique may be used to recombine homologous genes from related organisms, or to reassort random mutations, such as those generated by error-prone PCR.
The target DNA may be split into aliquots, and each aliquot is digested with a different restriction enzyme, or groups of restriction enzymes that cut the target DNA several times. Preferably, the restriction enzymes used in REFITS have four-base recognition site. In preferred embodiments, restriction enzymes are chosen to avoid large uncut fragments to improve the resolution of the recombination and help make sure that no large region remains unshuffled. The resolution of the recombination is determined by how close two mutations can be and still be separated and recombined at a detectable level. The resolution is also increased by using more enzymes to generate more pools of fragments. Since each separate digestion is done to completion, no careful timing of digestion is required, unlike Dnase I partial digestion. Some partial digestion products may also be tolerated by the REFITS procedure.
4. Staggered Extension Process (StEP)
Staggered extension process (StEP) is another simple and efficient method for in vitro recombination of polynucleotide sequences to generate libraries of sequence variants (Zhao et al., Nat. Biotechnol. 16:258-261 (1998)). Rather than reassembling recombined sequences from a pool of fragmented template sequences, StEP prepares full-length recombined genes in the presence of the templates. Essentially, StEP consists of priming the template sequences followed by repeated cycles of denaturation and extremely abbreviated annealing/polymerase-catalyzed extension. This limited polymerase extension time is used to generate less-than-full-length fragments. In each cycle the growing fragments anneal to different templates based on sequence complementarity and extend further to create xe2x80x9crecombination cassettes.xe2x80x9d This is repeated until full-length sequence form. Due to template switching, most of the polynucleotides contain sequence information from different template sequences. The speed of the thermal cycle may be adjusted to avoid the polymerase adding too many bases at each cycle. Adding too many bases at each cycle may limit the amount of possible template switches and so limiting the amount of recombination and resolution between template switches. StEP may be performed using flanking universal primers to avoid bias introduced from the starting primers.
StEP recombination reaction may be carried out in a single tube and separation of template templates from the recombined products may not be necessary. StEP may be followed by a gene amplification step, if desired.
It is further contemplated that various methods of in vitro mutagenesis and recombination may be combined to generate sequence libraries. It is also contemplated that a number of cycles of in vitro mutagenesis and recombination may be conducted with nucleic acid fragments from a subpopulation of the first population, which subpopulation contains desired nucleic acids. It is also contemplated that a number of cycles of in vitro mutagenesis and recombination may be conducted with a mixture of template nucleic acid fragments and a subpopulation of nucleic acid from the first or subsequent rounds.
III. Expressing Libraries of Nucleic Acid Sequence Variants in Plant Hosts
Once the population of the nucleic acid sequence variants is generated, the polynucleotides can be used directly, amplified using PCR based-technologies, or inserted into an appropriate cloning vector, using techniques well-known in the art. Libraries may often be first constructed in plasmid or phage shuttle vectors before excising and introducing into virus vectors. Likewise, sequences can be screened in hosts using virus vectors, but must be subcloned into appropriate eukaryotic expression vectors before the trait identified in the vector transfected host will become a stable trait in the host by gene integration. The choice of vector depends on the size of the polynucleotide sequence and the plant host to be employed in the methods of this invention. Suitable restriction sites to facilitate the insertion of sequence variants into viral vectors may be added.
Viral nucleic acids containing a population of sequence variants can be transfected as populations or individual clones into host: 1) protoplasts; 2) whole plants; or 3) plant tissues, such as leaves of plants (Dijkstra et al., Practical Plant Virology: Protocols and Exercises, Springer Verlag (1998); Plant Virology Protocol: From Virus Isolation to Transgenic Resistance in Methods in Molecular Biology, Vol. 81, Foster and Taylor, Ed., Humana Press (1998)). The plant host may be a monocotyledonous or dicotyledonous plant, plant tissue, or plant cell. Typically, plants of commercial interest, such as food crops, seed crops, oil crops, ornamental crops and forestry crops are preferred. For example, wheat, rice, corn, potato, barley, tobacco, soybean canola, maize, oilseed rape, lilies, grasses, orchids, irises, onions, palms, tomato, the legumes, or Arabidopsis, can be used as a plant host. Host plants may also include those readily infected by an infectious virus, such as Nicotiana, preferably, Nicotiana benthamiana, or Nicotiana clevelandii. 
In some embodiments of the instant invention, the delivery of the plant virus expression vectors into the plant may be affected by the inoculation of in vitro transcribed RNA, inoculation of virions, or internal inoculation of plant cells from nuclear cDNA, or the systemic infection resulting from any of these procedures. In all cases, the co-infection may lead to a rapid and pervasive systemic expression of the desired nucleic acid sequences in plant cells. The systemic infection of the plant by the foreign sequences may be followed by the growth of the infected host to produce the desired product, and the isolation and purification of the desired product, if necessary. The growth of the infected host is in accordance with conventional techniques, as is the isolation and the purification of the resultant products.
The host can be infected with a recombinant viral nucleic acid or a recombinant plant virus by conventional techniques. Suitable techniques include, but are not limited to, leaf abrasion, abrasion in solution, high velocity water spray, and other injury of a host as well as imbibing host seeds with water containing the recombinant viral RNA or recombinant plant virus. More specifically, suitable techniques include:
(a) Hand Inoculations. Hand inoculations are performed using a neutral pH, low molarity phosphate buffer, with the addition of celite or carborundum (usually about 1%). One to four drops of the preparation is put onto the upper surface of a leaf and gently rubbed.
(b) Mechanized Inoculations of Plant Beds. Plant bed inoculations are performed by spraying (gas-propelled) the vector solution into a tractor-driven mower while cutting the leaves. Alternatively, the plant bed is mowed and the vector solution sprayed immediately onto the cut leaves.
(c) High Pressure Spray of Single Leaves. Single plant inoculations can also be performed by spraying the leaves with a narrow, directed spray (50 psi, 6-12 inches from the leaf) containing approximately 1% carborundum in the buffered vector solution.
(d) Vacuum Infiltration. Inoculations may be accomplished by subjecting a host organism to a substantially vacuum pressure environment in order to facilitate infection.
(e) High Speed Robotics Inoculation. Especially applicable when the organism is a plant, individual organisms may be grown in mass array such as in microtiter plates. Machinery such as robotics may then be used to transfer the nucleic acid of interest.
(f) Ballistics (High Pressure Gun) Inoculation. Single plant inoculations can also be performed by particle bombardment. A ballistics particle delivery system (BioRad Laboratories, Hercules, (A) can be used to transfect plants such as N. benthamiana as described previously (Nagar et al., Plant Cell, 7:705-719 (1995)).
An alternative method for introducing viral nucleic acids into a plant host is a technique known as agroinfection or Agrobacterium-mediated transformation (also known as Agro-infection) as described by Grimsley et al., Nature 325:177 (1987). This technique makes use of a common feature of Agrobacterium which colonizes plants by transferring a portion of their DNA (the T-DNA) into a host cell, where it becomes integrated into nuclear DNA. The T-DNA is defined by border sequences which are 25 base pairs long, and any DNA between these border sequences is transferred to the plant cells as well. The insertion of a recombinant plant viral nucleic acid between the T-DNA border sequences results in transfer of the recombinant plant viral nucleic acid to the plant cells, where the recombinant plant viral nucleic acid is replicated, and then spreads systemically through the plant. Agro-infection has been accomplished with potato spindle tuber viroid (PSTV) (Gardner et al., Plant Mol. Biol. 6:221 (1986); CaV (Grimsley et al., Proc. Natl. Acad. Sci. USA 83:3282 (1986)); MSV (Grimsley et al., Nature 325:177 (1987)), and Lazarowitz, S., Nucl. Acids Res. 16:229 (1988)) digitaria streak virus (Donson et al., Virology 162:248 (1988)), wheat dwarf virus (Hayes et al, J. Gen. Virol. 69:891 (1988)) and tomato golden mosaic virus (TGMV) (Elmer et al., Plant Mol. Biol. 10:225 (1988) and Gardiner et al., EMBO J. 7:899 (1988)). Therefore, agro-infection of a susceptible plant could be accomplished with a virion containing a recombinant plant viral nucleic acid based on the nucleotide sequence of any of the above viruses. Particle bombardment or electrosporation or any other methods known in the art may also be used.
In some embodiments of the instant invention, infection may also be attained by placing a selected nucleic acid sequence into an organism such as E. coli, or yeast, either integrated into the genome of such organism or not, and then applying the organism to the surface of the host organism. Such a mechanism may thereby produce secondary transfer of the selected nucleic acid sequence into a host organism. This is a particularly practical embodiment when the host organism is a plant. Likewise, infection may be attained by first packaging a selected nucleic acid sequence in a pseudovirus. Such a method is described in WO 94/10329. Though the teachings of this reference may be specific for bacteria, those of skill in the art will readily appreciate that the same procedures could easily be adapted to other organisms.
IV. Screening and Selecting Desired Traits
After a plant host is infected with a population of sequence variants or individual clone of the population, one or more desired traits are screened and selected. The desired traits may include biochemical or phenotypic traits. Phenotypic traits may include, but not limited to, host range, viral infectivity, tolerance to herbicides, tolerance to extremes of heat or cold, drought, salinity or osmotic stress; resistance to pests (insects, nematodes or arachnids) or diseases (fungal, bacterial or viral), male or female sterility, dwarfness, early maturity; improved yield, vigor, heterosis, nutritional qualities, flavor or processing properties, and the like. Biochemical traits may be related to, for example, promoter activities, replication activities, translational activities, regulatory activities, movement activities (local and systemic), signaling activities, extraction/purification properties, etc. It should be noted that many biochemical traits also exhibit phenotypic traits and vice versa. The screening of sequence libraries is typically followed by rescue of the viruses from populations conferring desired traits. If necessary, re-screening of sub-libraries may be performed. In some embodiments, sequences of the viral nucleic acids conferring desired traits may be determined and compared with the template sequences.
To screen the desired traits, biochemical or phenotypic changes in a host plant is determined. The biochemical or phenotypic changes in the infected host plant may be correlated to the biochemistry or phenotype of a host plant that is uninfected. Optionally, the biochemical or phenotypic changes in the infected host plant is further correlated to a host plant that is infected with a viral vector that contains a control nucleic acid of a known sequence. The phenotypic changes in a plant host may be determined by any known methods in the art. Typically, these methods include visual, morphological, macroscopic or microscopic analysis. For example, growth changes, such as stunting, color changes (e.g. leaf yellowing, mottling, bleaching, chlorosis) among others are easily visualized. Examples of morphological changes include, developmental defects, wilting, necrosis, among others. Biochemical changes can be determined by any analytical methods known in the art for detecting, quantitating, or isolating DNA, RNA, proteins, antibodies, carbohydrates, lipids, and small molecules. Selected methods may include Northern, Western blotting, MALDI-TOF, LC/MS, GC/MS, two-dimensional IEF/SDS-PAGE, ELISA, etc. In particular, suitable methods may be performed in a high-throughput, fully automated fashion using robotics. Examples of biochemical changes may include the accumulation of substrates or products from enzymatic reactions, changes in biochemical pathways, inhibition or augmentation of endogenous gene expression in the cytoplasm of cells, changes in the RNA or protein profile.
Those of skill in the art will readily understand that there are many methods to determine phenotypic or biochemical changes in a plant host and to select one or more viral nucleic acids that confer the selected traits. In some embodiments, infected plant hosts capable of growing or maintaining viability in the presence of noxious or toxic substances, such as herbicides and pharmaceutical ingredients, may be selected.
Host plants vary in their ability to support expression of recombinant viral nucleic acids. Some species support expression to a high specific activity (such as Nicotiana benthamiana) but have relatively low biomass. Other species (such as N. tabacum) have high biomass and/or other desirable properties for growth in the field, but have a relatively low specific activity of the expressed sequence. In some embodiments, the plant hosts capable of producing optimal level of non-native products, such as enzymes, antibodies, hormones, pharmaceuticals, vaccines, pigments, antimicrobial polypeptides, and the like may be selected.
In some embodiments, inhibition of proteolytic activity in plants which is responsible for the degradation of recombinant proteins is often desired. Selection of inhibition of proteolytic activity in plants may use a library of viral sequence variants. The size and yield of protein of interest are determined and the desired results are correlated with members of the viral sequence variants.
The functions of transcription factors contributing to the signal transduction pathway of host cells can be monitored by using specific proteomic, mRNA or metanomic traits to be assayed following transfection with a virus expression library. The contribution of a particular protein or product to a valuable trait may be known from the literature, but a new mode of enhanced or reduced expression could be identified by finding the factors that alter its particular expression. For example, transcription factors regulating the expression of defense proteins such as systemin peptides, or protease inhibitors could be identified by transfecting hosts with virus libraries and the expression of systemin or protease inhibitors or their RNAs be directly assayed. Conversely, the promoters responsible for expressing these genes could be genetically fused to the green fluorescent protein and introduced into hosts as transient expression constructs or into stable transformed host cells/tissues. The resulting cells would be transfected with viral vector libraries. Those members of the library which altered the activity of the promoter of interest fused to the GFP reporter gene could be readily identified by monitoring changes of GFP expression (either an increase or a decrease in expression) in infected plant tissues, such as leaves.
In some embodiments, a DNA fragment which encodes for a protein with increased binding efficiency to a ligand is desired. The proteins expressed by each nucleic acid in the library may be tested for their ability to bind to the ligand by methods known in the art (i.e. panning, affinity chromatography). If a nucleic acid which encodes for a protein with increased drug resistance is desired, the proteins expressed by each of the nucleic acid in the population or library may be tested for their ability to confer drug resistance to the host organism.
For treatment of hosts with agents that result in cell death or down regulation in general metabolic function, a virus vector, which simultaneously express the green fluorescent protein (GFP) or other selectable marker gene and the variant sequence, is used to screen quantitatively for levels of resistance or sensitivity to the agent in question conferred upon the host by the variant sequence expressed from the viral vector. By quantitatively screening pools or individual infection events, those viruses containing unique variant sequences allowing sustained metabolic life of host are identified by fluorescence under long wave UV light. Those that do not confer this phenotype will fail to or poorly fluoresce. In this manner, high throughput screening in multi-well dishes in plate readers is possible where the average fluorescence of the well would be expressed as a ratio of the adsorption (measuring the cell mass) thereby giving a comparable quantitative value.
Recombinant proteins expressed in cells often do not have the desired solubility and processing/purification characteristics, making their purification tedious or problematic. One may alter amino acid sequence of a protein or protein-fusion of interest so that the resulting modified protein has desired characteristics of solubility, stability, etc. In some embodiments of the invention, a carrier protein-protein fusion of interest may be constructed to identify specific variants with altered purification properties. A reporter protein, such as, but not limited to proteins such as GFP or antibiotic/herbicide resistance genes, etc. can be used as potential carrier proteins for polypeptides of interest. The variants can be screened by detecting either the polypeptide of interest (using an appropriate assay) or by assaying for the carrier protein. For example, if a GFP-peptide fusion is expressed in plant hosts by observing GFP activity under long wave UV. In this way carrier protein-protein fusions with altered solubility and/or stabilities may be screened for. This approach can also be used to examine the purification properties of carrier protein-protein variants. For example, protein samples can be prepared from cells expressing different variants of a carrier-protein-protein fusion (these extracts can be from extracellular, secreted, protein or intracellular proteins). These extracts can be run through a series of different partial purification steps such as (but not limited to) treatments of various pHs and temperatures, centrifugation steps, etc. Finally the treated samples can be assayed for the presence of either the carrier protein domain of the fusion of the protein of interest, for example, in order to estimate the amount of protein that survives the purification regimes. Variants with characteristics of interest can be isolated for more experimentation and or modification if desired.
Other selectable traits may include: 1) selection of viruses or viral vectors with altered host range, tissue tropisms, and cell-to-cell or systemic movement properties; 2) selection of virus cis-acting or trans-acting sequences or encoded products which alter regulation (transcription or message stability) of native or foreign gene expression in viruses or viral vectors; 3) selection of sequences that flank foreign sequence additions that alter translation of foreign sequences; 4) selection of sequences that flank foreign sequence additions that increase the genetic stability of specific or general foreign sequences ligated into virus vectors; 5) incorporation of flanking sequences or sequences internal to a sequence motif to be expressed that mediate appropriate conformational presentation of epitopes that are fused to native or modified virus proteins or other carrier proteins; 6) selection of sequences that can replace structural elements in native or modified virus proteins; 7) selection of sequences that can be added and structurally tolerated to native or modified virus proteins which add functionality to the encoded protein; 8) incorporation of down-regulating mutations into virus encoded cis- or trans-acting virus sequences and the selection of variants, induced or naturally occurring, that are second-site revertants, thereby restoring part of the whole of the impaired functionality; 9) selection of sequences added into foreign or native/modified virus proteins which promote the proper folding, sub-cellular localization, expression levels, extraction/purification properties, and activity of protein of the interest; and 10) incorporation of sequences that remove non-viral RNA sequences from in vitro derived transcripts of virus or viral vectors, which include, but are not limited to, ribozymes.
In order to provide a clear and consistent understanding of the specification and the claims, including the scope given herein to such terms, the following definitions are given:
5xe2x80x2 or 3xe2x80x2 NTR: nontranslated region of a viral genome at the 5xe2x80x2 or 3xe2x80x2 end.
Cis-acting (cis-dependent): interaction of a molecule or complex with itself or between a gene product with the nucleic acid from which it was expressed.
Coat protein (capsid protein): an outer structural protein of a virus.
Gene: a discrete nucleic acid sequence responsible for a discrete cellular product.
Host: a cell, tissue or organism capable of replicating a vector or viral nucleic acid and which is capable of being infected by a virus containing the viral vector or viral nucleic acid. This term is intended to include prokaryotic and eukaryotic cells, organs, tissues, organisms, or in vitro extracts thereof, where appropriate.
Infection: the ability of a virus to transfer its nucleic acid to a host or introduce viral nucleic acid into a host, wherein the viral nucleic acid is replicated, viral proteins are synthesized, and new viral particles assembled.
Internal initiation site: any of the internal regions that direct ribosome-mediated translation of mRNA into polypeptides.
Movement protein: a noncapsid protein required for cell-to-cell movement of RNA replicons or viruses in plants.
Non-native (foreign): any sequence that does not normally occur in the virus or its host.
Open Reading Frame: a nucleotide sequence of suitable length in which there are no stop codons.
Packaging signal: the RNA sequence(s) responsible for enclosing the RNA within the capsid or coat protein(s) to form a mature virion.
PCR: a broad range of polynucleotide amplification techniques for increasing the number of copies of specific polynucleotide sequences. Examples of polynucleotide amplification reactions include, but not limited to, polymerase chain reaction (PCR, U.S. Pat. Nos. 4,683,202 and 4,683,195), reverse transcriptase PCR (RT-PCR), nucleic acid sequence based amplification (NASB), self-sustained sequence replication (3SR), strand displacement activation (SDA), ligase chain reaction (LCR), rolling-circle amplification (RCA), Qxcex2 replicase system, and the like (Isaksson and Landegren, Curr. Opin. Biotechnol. 10:11-15 (1999); Landegren, Curr. Opin. Biotechnol. 7:95-97 (1996); and Abramson et al., Curr. Opin. Biotechnol. 4:41-47 (1993)).
Plant Cell: the structural and physiological unit of plants, consisting of a protoplast and the cell wall.
Plant Tissue: any tissue of a plant in planta or in culture. This term is intended to include a whole plant, plant cell, plant organ, protoplast, cell culture, or any group of plant cells organized into a structural and functional unit.
Promoter: the 5xe2x80x2-flanking, non-coding sequence adjacent to a coding sequence which is involved in the initiation of transcription of the coding sequence.
Protoplast: an isolated cell without cell walls, having the potency for regeneration into cell culture or a whole host.
Subgenomic mRNA promoter: a promoter that directs the synthesis of an mRNA smaller than the full-length genome in size.
Trans-acting: interaction of a molecule or complex on other molecule(s) independent from itself or independent from the nucleic acid from which it was expressed.
Vector: a self-replicating nucleic acid molecule that contains non-native sequences and which transfers nucleic acid segments between cells.
Virion: a particle composed of viral nucleic acid, viral coat protein (or capsid protein).
Virus: an infectious agent composed of a nucleic acid encapsulated in a protein.