This invention relates to improved methods of grain processing to enhance protein and starch recovery, particularly in corn wet milling and soybean processing, as well as novel transgenic plants useful in such processes.
Thioredoxin (TRX) and thioredoxin reductase (TR) are enzymes that use NADPH to reduce disulphide bonds in proteins. Protein disulphide bonds play an important role in grain processing efficiencies and in the quality of the products recovered from grain processing. Development of effective ways to eliminate or decrease the extent of protein disulphide bonding in grain would increase processing efficiencies. Additionally, grain and grain-derived product performance in livestock feed are also affected by inter- and intramolecular disulphide bonding. Grain digestibility, nutrient availability and the neutralization of anti-nutritive factors (e.g., protease, amylase inhibitors etc.) would be increased by reducing the extent of disulphide bonding. See, PCT/EP99/09986, filed Dec. 15, 1999, and U.S. Provisional Application No. 60/183,051, filed Dec. 17, 1998, both of which are incorporated herein by reference.
Expression of transgenic thioredoxin and/or thioredoxin reductase in corn and soybeans and the use of thioredoxin in grain processing, e.g., wet milling, is novel and provides an alternative method for reducing the disulfide bonds in seed proteins during or prior to industrial processing. The invention therefore provides grains with altered storage protein quality as well as grains that perform qualitatively differently from normal grain during industrial processing or animal digestion (both referred to subsequently as xe2x80x9cprocessingxe2x80x9d).
This method of delivery of thioredoxin and/or thioredoxin reductase eliminates the need to develop exogenous sources of thioredoxin and/or thioredoxin reductase for addition during processing. A second advantage to supplying thioredoxin and/or thioredoxin reductase via the grains is that physical disruption of seed integrity is not necessary to bring the enzyme in contact with the storage or matrix proteins of the seed prior to processing or as an extra processing step.
Three modes of thioredoxin utilization in grain processing are provided:
1. Expression and action during seed development to alter the composition and quality of harvested grain;
2. Expression (but no activity) during seed development to alter the quality of the products upon processing;
3. Production of thioredoxin and/or thioredoxin reductase in grain that is used to alter the quality of other grain products by addition during processing.
The invention described herein is applicable to all grain crops, in particular corn, soybean, wheat, and barley, most particularly corn and soybean, especially corn. Expression of transgenic thioredoxin and/or thioredoxin reductase in grain is a means of altering the quality of the material (seeds) going into grain processing, altering the quality of the material derived from grain processing, maximizing yields of specific seed components during processing (increasing efficiency), changing processing methods, and creating new uses for seed-derived fractions or components from milling streams.
The invention thus provides a plant which expresses a thioredoxin and/or thioredoxin reductase, e.g. a thioredoxin and/or thioredoxin reductase not naturally expressed in plants, for example a plant comprising a heterologous DNA sequence coding for a thioredoxin stably integrated into its nuclear or plastid DNA, preferably under control of an inducible promoter, e.g., a chemically-inducible promoter, for example either operatively linked to the inducible promoter or under control of transactivator-regulated promoter wherein the corresponding transactivator is under control of the inducible promoter or is expressed in a second plant such that the promoter is activated by hybridization with the second plant; wherein the thioredoxin or thioredoxin reductase is preferably thermostable; such plant also including seed therefor, which seed is optionally treated (e.g., primed or coated) and/or packaged, e.g. placed in a bag with instructions for use, and seed harvested therefrom, e.g., for use in a milling process as described above.
The transgenic plant of the invention may optionally further comprise genes for enhanced production of thioredoxin reductase and/or NADPH.
The invention further provides a method for producing a thioredoxin comprising cultivating a thioredoxin-expressing plant as described above; a method for producing starch and/or protein comprising extracting starch or protein from seed harvested from a plant as described above; and a method for wet milling comprising steeping seed from a thioredoxin-expressing plant as described above and extracting starch and/or protein therefrom.
The invention further provides a plant expressible expression cassette comprising a coding region for a thioredoxin or thioredoxin reductase, preferably a thioredoxin derived from a thermophilic organism, e.g., from an archea, for example from Methanococcus jannaschii or Archaeglobus fulgidus, e.g., as described below, wherein the coding region is preferably optimized to contain plant preferred codons, said coding region being operatively linked to promoter and terminator sequences which function in a plant, wherein the promotor is preferably a seed specific promoter or an inducible promoter, e.g., a chemically inducible or transactivator-regulated promoter; for example a plastid or nuclear expressible expression cassette comprising a promoter, e.g., a transactivator-mediated promoter regulated by a nuclear transactivator (e.g., the T7 promoter when the transactivator is T7 RNA polynerase the expression of which is optionally under control of an inducible promoter).
The invention further provides a vector comprising such a plant expressible expression cassette.
The invention further provides a plant transformed with such a vector, or a transgenic plant which comprises in its genome, e.g., its nuclear or plastid genome, such a plant expressible expression cassette.
The invention also comprises a method of producing grain comprising high levels of thioredoxin or thioredoxin reductase comprising pollinating a first plant comprising a heterologous expression cassette comprising a transactivator-mediated promoter regulated and operatively linked to a DNA sequence coding for a thioredoxin or thioredoxin reductase, the first plant preferably being emasculated or male sterile, with pollen from a second plant comprising a heterologous expression cassette comprising a promoter operatively linked to a DNA sequence coding for a transactivator capable of regulating said transactivator-mediated promoter, and recovering grain from the plant thus pollinated.
The invention also provides a nucleic acid molecule comprising a nucleotide sequence encoding an Arabidopsis NADPH+ dependent thioredoxin reductase (NTR), wherein the nucleotide sequence is optimized for expression in a monocotyledonous plant, preferably optimized for expression in maize. The nucleotide sequence is preferably the nucleotide sequence of SEQ ID NO:24 and preferably encodes the amino acid sequence of SEQ ID NO:25.
The invention also provides an isolated nucleic molecule comprising a nucleotide sequence encoding a rice NADPH+ dependent thioredoxin reductase (NTR). The nucleotide sequence preferably encodes the amino acid sequence of SEQ ID NO:27. The nucleotide sequence is preferably the nucleotide sequence of SEQ ID NO:25.
SEQ ID NO:1xe2x80x94Protein sequence of thioredoxin from Methanococcus jannaschii (gi|1591029).
SEQ ID NO:2xe2x80x94Protein sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649903)(trx-1).
SEQ ID NO:3xe2x80x94Protein sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649838) (trx-2).
SEQ ID NO:4xe2x80x94Protein sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649295) (trx-3).
SEQ ID NO:5xe2x80x94Protein sequence of thioredoxin from Archaeoglobus fulgidus (gi|2648389) (trx-4).
SEQ ID NO:6xe2x80x94Protein sequence of thioredoxin reductase (trxB) from Methanococcus jannaschii (gi|592167).
SEQ ID NO:7xe2x80x94Protein sequence of thioredoxin reductase from Archaeoglobus fulgidus (gi|2649006) (trxB).
SEQ ID NO:8xe2x80x94Primer NMD109.
SEQ ID NO:9xe2x80x94Primer NMD110.
SEQ ID NO:10xe2x80x94Primer NMD102.
SEQ ID NO:11xe2x80x94Primer NMD103.
SEQ ID NO:12xe2x80x94Primer NMD124A.
SEQ ID NO:13xe2x80x94Primer NMD125A.
SEQ ID NO:14xe2x80x94Primer NMD126.
SEQ ID NO:15xe2x80x94Primer NMD127.
SEQ ID NO:16xe2x80x94Primer NMD128.
SEQ ID NO:17xe2x80x94Primer NMD129.
SEQ ID NO:18xe2x80x94Primer STRF1A.
SEQ ID NO:19xe2x80x94Primer STRF1B.
SEQ ID NO:20xe2x80x94Primer STRF2A.
SEQ ID NO:21xe2x80x94Primer STRF2B.
SEQ ID NO:22xe2x80x94Primer STR3A.
SEQ ID NO:23xe2x80x94Primer STR3B.
SEQ ID NO:24xe2x80x94Maize optimized Arabidopsis NADPH dependent thioredoxin reductase coding sequence.
SEQ ID NO:25xe2x80x94Amino acid sequence encoded by SEQ ID NO:24.
SEQ ID NO:26xe2x80x94Rice NADPH dependent thioredoxin reductase (NTR) coding sequence.
SEQ ID NO:27xe2x80x94Amino acid sequence encoded by SEQ ID NO:26.
SEQ ID NO:28xe2x80x94Primer P9.
SEQ ID NO:29xe2x80x94Primer P10.
SEQ ID NO:30xe2x80x94Primer P4.
SEQ ID NO:31xe2x80x94Primer P1.
SEQ ID NO:32xe2x80x94Primer P2.
SEQ ID NO:33xe2x80x94Primer P5.
SEQ ID NO:34xe2x80x94Primer P12.
SEQ ID NO:35xe2x80x94Primer P11.
SEQ ID NO:36xe2x80x94Primer P27.
SEQ ID NO:37xe2x80x94Primer P28.
SEQ I) NO:38xe2x80x94Primer P29.
SEQ ID NO:39xe2x80x94Primer P26.
SEQ ID NO:40xe2x80x94Primer P31.
SEQ ID NO:41xe2x80x94Primer Thiorodoxubi 1603.
SEQ ID NO:42xe2x80x94Primer Thiorodox 2364.
xe2x80x9cAssociated with/operatively linkedxe2x80x9d refer to two nucleic acid sequences that are related physically or functionally. For example, a promoter or regulatory DNA sequence is said to be xe2x80x9cassociated withxe2x80x9d a DNA sequence that codes for an RNA or a protein if the two sequences are operatively linked, or situated such that the regulator DNA sequence will affect the expression level of the coding or structural DNA sequence.
A xe2x80x9cchimeric genexe2x80x9d is a recombinant nucleic acid sequence in which a promoter or regulatory nucleic acid sequence is operatively linked to, or associated with, a nucleic acid sequence that codes for an mRNA or which is expressed as a protein, such that the regulator nucleic acid sequence is able to regulate transcription or expression of the associated nucleic acid sequence. The regulator nucleic acid sequence of the chimeric gene is not normally operatively linked to the associated nucleic acid sequence as found in nature.
A xe2x80x9ccoding sequencexe2x80x9d is a nucleic acid sequence that is transcribed into RNA such as mRNA, rRNA, tRNA, snRNA, sense RNA or antisense RNA. Preferably the RNA is then translated in an organism to produce a protein.
Complementary: xe2x80x9ccomplementaryxe2x80x9d refers to two nucleotide sequences that comprise antiparallel nucleotide sequences capable of pairing with one another upon formation of hydrogen bonds between the complementary base residues in the antiparallel nucleotide sequences.
DNA Shuffling: DNA shuffling is a method to rapidly, easily and efficiently introduce mutations or rearrangements, preferably randomly, in a DNA molecule or to generate exchanges of DNA sequences between two or more DNA molecules, preferably randomly. The DNA molecule resulting from DNA shuffling is a shuffled DNA molecule that is a non-naturally occurring DNA molecule derived from at least one template DNA molecule. The shuffled DNA encodes an enzyme modified with respect to the enzyme encoded by the template DNA, and preferably has an altered biological activity with respect to the enzyme encoded by the template DNA.
Enzyme/Protein Activity: means herein the ability of an enzyme (or protein) to catalyze the conversion of a substrate into a product. A substrate for the enzyme comprises the natural substrate of the enzyme but also comprises analogues of the natural substrate, which can also be converted, by the enzyme into a product or into an analogue of a product. The activity of the enzyme is measured for example by determining the amount of product in the reaction after a certain period of time, or by determining the amount of substrate remaining in the reaction mixture after a certain period of time. The activity of the enzyme is also measured by determining the amount of an unused co-factor of the reaction remaining in the reaction mixture after a certain period of time or by determining the amount of used co-factor in the reaction mixture after a certain period of time. The activity of the enzyme is also measured by determining the amount of a donor of free energy or energy-rich molecule (e.g. ATP, phosphoenolpyruvate, acetyl phosphate or phosphocreatine) remaining in the reaction mixture after a certain period of time or by determining the amount of a used donor of free energy or energy-rich molecule (e.g. ADP, pyruvate, acetate or creatine) in the reaction mixture after a certain period of time.
Expression Cassette: xe2x80x9cExpression cassettexe2x80x9d as used herein means a DNA sequence capable of directing expression of a particular nucleotide sequence in an appropriate host cell, comprising a promoter operatively linked to the nucleotide sequence of interest which is operatively linked to termination signals. It also typically comprises sequences required for proper translation of the nucleotide sequence. The coding region usually codes for a protein of interest but may also code for a functional RNA of interest, for example antisense RNA or a nontranslated RNA, in the sense or antisense direction. The expression cassette comprising the nucleotide sequence of interest may be chimeric, meaning that at least one of its components is heterologous with respect to at least one of its other components. The expression cassette may also be one which is naturally occurring but has been obtained in a recombinant form useful for heterologous expression. Typically, however, the expression cassette is heterologous with respect to the host, i.e., the particular DNA sequence of the expression cassette does not occur naturally in the host cell and must have been introduced into the host cell or an ancestor of the host cell by a transformation event. The expression of the nucleotide sequence in the expression cassette may be under the control of a constitutive promoter or of an inducible promoter which initiates transcription only when the host cell is exposed to some particular external stimulus. In the case of a multicellular organism, such as an insect, the promoter can also be specific to a particular tissue or organ or stage of development.
Gene: the term xe2x80x9cgenexe2x80x9d is used broadly to refer to any segment of DNA associated with a biological function. Thus, genes include coding sequences and/or the regulatory sequences required for their expression. Genes also include nonexpressed DNA segments that, for example, form recognition sequences for other proteins. Genes can be obtained from a variety of sources, including cloning from a source of interest or synthesizing from known or predicted sequence information, and may include sequences designed to have desired parameters.
Heterologous DNA Sequence: The terms xe2x80x9cheterologous DNA sequencexe2x80x9d, xe2x80x9cexogenous DNA segmentxe2x80x9d or xe2x80x9cheterologous nucleic acid,xe2x80x9d as used herein, each refer to a sequence that originates from a source foreign to the particular host cell or, if from the same source, is modified from its original form. Thus, a heterologous gene in a host cell includes a gene that is endogenous to the particular host cell but has been modified through, for example, the use of DNA shuffling. The terms also includes non-naturally occurring multiple copies of a naturally occurring DNA sequence. Thus, the terms refer to a DNA segment that is foreign or heterologous to the cell, or homologous to the cell but in a position within the host cell nucleic acid in which the element is not ordinarily found. Exogenous DNA segments are expressed to yield exogenous polypeptides.
Homologous DNA Sequence: a DNA sequence naturally associated with a host cell.
xe2x80x9cHomoplastidicxe2x80x9d refers to a plant, plant tissue or plant cell wherein all of the plastids are genetically identical. This is the normal state in a plant when the plastids have not been transformed, mutated, or otherwise genetically altered. In different tissues or stages of development, the plastids may take different forms, e.g., chloroplasts, proplastids, etioplasts, amyloplasts, chromoplasts, and so forth.
Isolated: in the context of the present invention, an isolated DNA molecule or an isolated enzyme is a DNA molecule or protein which, by the hand of man, exists apart from its native environment and is therefore not a product of nature. An isolated DNA molecule or protein may exist in a purified form or may exist in a non-native environment such as, for example, in a transgenic host cell.
Mature Protein: protein that is normally targeted to a cellular organelle and from which the transit peptide has been removed.
Minimal Promoter: promoter elements, particularly a TATA element, that are inactive or that have greatly reduced promoter activity in the absence of upstream activation. In the presence of a suitable transcription factor, the minimal promoter functions to permit transcription.
Modified Enzyme Activity: enzyme activity different from that which naturally occurs in an insect (i.e. enzyme activity that occurs naturally in the absence of direct or indirect manipulation of such activity by man), which is tolerant to inhibitors that inhibit the naturally occurring enzyme activity.
Native: refers to a gene that is present in the genome of an untransformed insect cell.
Naturally occurring: the term xe2x80x9cnaturally occurringxe2x80x9d is used to describe an object that can be found in nature as distinct from being artificially produced by man. For example, a protein or nucleotide sequence present in an organism (including a virus), which can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory, is naturally occurring.
Nucleic acid: the term xe2x80x9cnucleic acidxe2x80x9d refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing known analogues of natural nucleotides which have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence also implicitly encompasses conservatively modified variants thereof (e.g. degenerate codon substitutions) and complementary sequences and as well as the sequence explicitly indicated. Specifically, degenerate codon substitutions may be achieved by generating sequences in which the third position of one or more selected (or all) codons is substituted with mixed-base and/or deoxyinosine residues (Batzer et al., Nucleic Acid Res. 19: 5081 (1991); Ohtsuka et al., J. Biol. Chem. 260: 2605-2608 (1985); Rossolini et al., Mol. Cell. Probes 8: 91-98 (1994)). The terms xe2x80x9cnucleic acidxe2x80x9d or xe2x80x9cnucleic acid sequencexe2x80x9d may also be used interchangeably with gene, cDNA, and mRNA encoded by a gene.
A xe2x80x9cplantxe2x80x9d is any plant at any stage of development, particularly a seed plant.
A xe2x80x9cplant cellxe2x80x9d is a structural and physiological unit of a plant, comprising a protoplast and a cell wall. The plant cell may be in form of an isolated single cell or a cultured cell, or as a part of higher organized unit such as, for example, plant tissue, a plant organ, or a whole plant.
xe2x80x9cPlant cell culturexe2x80x9d means cultures of plant units such as, for example, protoplasts, cell culture cells, cells in plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes and embryos at various stages of development.
xe2x80x9cPlant materialxe2x80x9d refers to leaves, stems, roots, flowers or flower parts, fruits, pollen, egg cells, zygotes, seeds, cuttings, cell or tissue cultures, or any other part or product of a plant.
A xe2x80x9cplant organxe2x80x9d is a distinct and visibly structured and differentiated part of a plant such as a root, stem, leaf, flower bud, or embryo.
xe2x80x9cPlant tissuexe2x80x9d as used herein means a group of plant cells organized into a structural and functional unit. Any tissue of a plant in planta or in culture is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue culture and any groups of plant cells organized into structural and/or functional units. The use of this term in conjunction with, or in the absence of, any specific type of plant tissue as listed above or otherwise embraced by this definition is not intended to be exclusive of any other type of plant tissue.
A xe2x80x9cpromoterxe2x80x9d is an untranslated DNA sequence upstream of the coding region that contains the binding site for RNA polymerase II and initiates transcription of the DNA. The promoter region may also include other elements that act as regulators of gene expression.
A xe2x80x9cprotoplastxe2x80x9d is an isolated plant cell without a cell wall or with only parts of the cell wall.
Purified: the term xe2x80x9cpurified,xe2x80x9d when applied to a nucleic acid or protein, denotes that the nucleic acid or protein is essentially free of other cellular components with which it is associated in the natural state. It is preferably in a homogeneous state although it can be in either a dry or aqueous solution. Purity and homogeneity are typically determined using analytical chemistry techniques such as polyacrylamide gel electrophoresis or high performance liquid chromatography. A protein which is the predominant species present in a preparation is substantially purified. The term xe2x80x9cpurifiedxe2x80x9d denotes that a nucleic acid or protein gives rise to essentially one band in an electrophoretic gel. Particularly, it means that the nucleic acid or protein is at least about 50% pure, more preferably at least about 85% pure, and most preferably at least about 99% pure.
xe2x80x9cRegulatory elementsxe2x80x9d refer to sequences involved in controlling the expression of a nucleotide sequence. Regulatory elements comprise a promoter operatively linked to the nucleotide sequence of interest and termination signals. They also typically encompass sequences required for proper translation of the nucleotide sequence.
Significant Increase: an increase in enzymatic activity that is larger than the margin of error inherent in the measurement technique, preferably an increase by about 2-fold or greater of the activity of the wild-type enzyme in the presence of the inhibitor, more preferably an increase by about 5-fold or greater, and most preferably an increase by about 10-fold or greater.
The terms xe2x80x9cidenticalxe2x80x9d or percent xe2x80x9cidentityxe2x80x9d in the context of two or more nucleic acid or protein sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection.
Substantially identical: the phrase xe2x80x9csubstantially identical,xe2x80x9d in the context of two nucleic acid or protein sequences, refers to two or more sequences or subsequences that have at least 60%, preferably 80%, more preferably 90-95%, and most preferably at least 99% nucleotide or amino acid residue identity, when compared and aligned for maximum correspondence, as measured using one of the following sequence comparison algorithms or by visual inspection. Preferably, the substantial identity exists over a region of the sequences that is at least about 50 residues in length, more preferably over a region of at least about 100 residues, and most preferably the sequences are substantially identical over at least about 150 residues. In a most preferred embodiment, the sequences are substantially identical over the entire length of the coding regions. Furthermore, substantially identical nucleic acid or protein sequences perform substantially the same function.
For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson and Lipman, Proc. Nat""l. Acad. Sci. USA 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, Wis.), of by visual inspection (see generally, Ausubel et al., infra).
One example of an algorithm that is suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul :et al., J. Mol. Biol. 215: 403-410 (1990). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., 1990). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always  greater than 0) and N (penalty score for mismatching residues; always  less than 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=xe2x88x924, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff, Proc. Natl. Acad. Sci. USA 89: 10915 (1989)).
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat""l. Acad. Sci. USA 90: 5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.
Another indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions. The phrase: xe2x80x9chybridizing specifically toxe2x80x9d refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. xe2x80x9cBind(s) substantiallyxe2x80x9d refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target nucleic acid sequence.
xe2x80x9cStringent hybridization conditionsxe2x80x9d and xe2x80x9cstringent hybridization wash conditionsxe2x80x9d in the context of nucleic acid hybridization experiments such as Southern and Northern hybridizations are sequence dependent, and are different under different environmental parameters. Longer sequences hybridize specifically at higher temperatures. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes part I chapter 2 xe2x80x9cOverview of principles of hybridization and the strategy of nucleic acid probe assaysxe2x80x9d Elsevier, New York. Generally, highly stringent hybridization and wash conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. Typically, under xe2x80x9cstringent conditionsxe2x80x9d a probe will hybridize to its target subsequence, but to no other sequences.
The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe. An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42xc2x0 C., with the hybridization being carried out overnight. An example of highly stringent wash conditions is 0.1 5M NaCl at 72xc2x0 C. for about 15 minutes. An example of stringent wash conditions is a 0.2xc3x97SSC wash at 65xc2x0 C. for 15 minutes (see, Sambrook, infra, for a description of SSC buffer). Often, a high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., more than 100 nucleotides, is 1xc3x97SSC at 45xc2x0 C. for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6xc3x97SSC at 40xc2x0 C. for 15 minutes. For short probes (e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt concentrations of less than about 1.0M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is typically at least about 30xc2x0 C. Stringent conditions can also be achieved with the addition of destabilizing agents such as formamide. In general, a signal to noise ratio of 2xc3x97 (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization. Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
The following are examples of sets of hybridization/wash conditions that may be used to clone homologous nucleotide sequences that are substantially identical to reference nucleotide sequences of the present invention: a reference nucleotide sequence preferably hybridizes to the reference nucleotide sequence in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50xc2x0 C. with washing in 2xc3x97SSC, 0.1% SDS at 50xc2x0 C., more desirably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50xc2x0 C. with washing in 1xc3x97SSC, 0.1% SDS at 50xc2x0 C., more desirably still in 7% sodium dodecyl sulfate (SDS); 0.5 M NaPO4, 1 mM EDTA at 50xc2x0 C. with washing in 0.5xc3x97SSC, 0.1% SDS at 50xc2x0 C., preferably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50xc2x0 C. with washing in 0.1xc3x97SSC, 0.1% SDS at 50xc2x0 C., more preferably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaPO4, 1 mM EDTA at 50xc2x0 C. with washing in 0.1xc3x97SSC, 0.1% SDS at 65xc2x0 C.
A further indication that two nucleic acid sequences or proteins are substantially identical is that the protein encoded by the first nucleic acid is immunologically cross reactive with, or specifically binds to, the protein encoded by the second nucleic acid. Thus, a protein is typically substantially identical to a second protein, for example, where the two proteins differ only by conservative substitutions.
The phrase xe2x80x9cspecifically (or selectively) binds to an antibody,xe2x80x9d or xe2x80x9cspecifically (or selectively) immunoreactive with,xe2x80x9d when referring to a protein or peptide, refers to a binding reaction which is determinative of the presence of the protein in the presence of a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, antibodies raised to the protein with the amino acid sequence encoded by any of the nucleic acid sequences of the invention can be selected to obtain antibodies specifically immunoreactive with that protein and not with other proteins except for polymorphic variants. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays, Western blots, or immunohistochemistry are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York xe2x80x9cHarlow and Lanexe2x80x9d), for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity. Typically a specific or selective reaction will be at least twice background signal or noise and more typically more than 10 to 100 times background.
xe2x80x9cConservatively modified variationsxe2x80x9d of a particular nucleic acid sequence refers to those nucleic acid sequences that encode identical or essentially identical amino acid sequences, or where the nucleic acid sequence does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given polypeptide. For instance the codons CGT, CGC, CGA, CGG, AGA, and AGG all encode the amino acid arginine. Thus, at every position where an arginine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded protein. Such nucleic acid variations are xe2x80x9csilent variationsxe2x80x9d which are one species of xe2x80x9cconservatively modified variations.xe2x80x9d Every nucleic acid sequence described herein which encodes a protein also describes every possible silent variation, except where otherwise noted. One of skill will recognize that each codon in a nucleic acid (except ATG, which is ordinarily the only codon for methionine) can be modified to yield a functionally identical molecule by standard techniques. Accordingly, each xe2x80x9csilent variationxe2x80x9d of a nucleic acid which encodes a protein is implicit in each described sequence.
Furthermore, one of skill will recognize that individual substitutions deletions or additions that alter, add or delete a single amino acid or a small percentage of amino acids (typically less than 5%, more typically less than 1%) in an encoded sequence are xe2x80x9cconservatively modified variations,xe2x80x9d where the alterations result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following five groups each contain amino acids that are conservative substitutions for one another: Aliphatic: Glycine (G), Alanine (A), Valine (V), Leucine (L), Isoleucine (I); Aromatic: Phenylalanine (F), Tyrosine (Y), Tryptophan (W); Sulfur-containing: Methionine (M), Cysteine (C); Basic: Arginine (R), Lysine (K), Histidine (H); Acidic: Aspartic acid (D), Glutamic acid: (E), Asparagine (N), Glutamine (Q). See also, Creighton (1984) Proteins, W. H. Freeman and Company. In addition, individual substitutions, deletions or additions which alter, add or delete a single amino acid or a small percentage of amino acids in an encoded sequence are also xe2x80x9cconservatively modified variations.xe2x80x9d
A xe2x80x9csubsequencexe2x80x9d refers to a sequence of nucleic acids or amino acids that comprise a part of a longer sequence of nucleic acids or amino acids (e.g., protein) respectively.
Nucleic acids are xe2x80x9celongatedxe2x80x9d when additional nucleotides (or other analogous molecules) are incorporated into the nucleic acid. Most commonly, this is performed with a polymerase (e.g., a DNA polymerase), e.g., a polymerase which adds sequences at the 3xe2x80x2 terminus of the nucleic acid.
Two nucleic acids are xe2x80x9crecombinedxe2x80x9d when sequences from each of the two nucleic acids are combined in a progeny nucleic acid. Two sequences are xe2x80x9cdirectlyxe2x80x9d recombined when both of the nucleic acids are substrates for recombination. Two sequences are xe2x80x9cindirectly recombinedxe2x80x9d when the sequences are recombined using an intermediate such as a cross-over oligonucleotide. For indirect recombination, no more than one of the sequences is an actual substrate for recombination, and in some cases, neither sequence is a substrate for recombination.
xe2x80x9cSyntheticxe2x80x9d refers to a nucleotide sequence comprising structural characters that are not present in the natural sequence. For example, an artificial sequence that resembles more closely the G+C content and the normal codon distribution of dicot and/or monocot genes is said to be synthetic.
A xe2x80x9ctransactivatorxe2x80x9d is a protein which, by itself or in combination with one or more additional proteins, is capable of causing transcription of a coding region under control of a corresponding transactivator-mediated promoter. Examples of transactivator systems include phage T7 gene 10 promoter, the transcriptional activation of which is dependent upon a specific RNA polymerase such as the phage T7 RNA polymerase. The transactivator is typically an RNA polymerase or DNA binding protein capable of interacting with a particular promoter to initiate transcription, either by activating the promoter directly or by inactivating a repressor gene, e.g., by suppressing expression or accumulation of a repressor protein. The DNA binding protein may be a chimeric protein comprising a binding region (e.g., the GAL4 binding region) linked to an appropriate transcriptional activator domain. Some transactivator systems may have multiple transactivators, for example promoters which require not only a polymerase but also a specific subunit (sigma factor) for promotor recognition, DNA binding, or transcriptional activation. The transactivator is preferably heterologous with respect to the plant.
Transformation: a process for introducing heterologous DNA into a cell, tissue, or insect. Transformed cells, tissues, or insects are understood to encompass not only the end product of a transformation process, but also transgenic progeny thereof.
xe2x80x9cTransformed,xe2x80x9d xe2x80x9ctransgenic,xe2x80x9d and xe2x80x9crecombinantxe2x80x9d refer to a host organism such as a bacterium or a plant into which a heterologous nucleic acid molecule has been introduced. The nucleic acid molecule can be stably integrated into the genome of the host or the nucleic acid molecule can also be present as an extrachromosomal molecule. Such an extrachromosomal molecule can be auto-replicating. Transformed cells, tissues, or plants are understood to encompass not only the end product of a transformation process, but also transgenic progeny thereof. A xe2x80x9cnon-transformed,xe2x80x9d xe2x80x9cnon-transgenic,xe2x80x9d or xe2x80x9cnon-recombinantxe2x80x9d host refers to a wild-type organism, e.g., a bacterium or plant, which does not contain the heterologous nucleic acid molecule.
Nucleotides are indicated by their bases by the following standard abbreviations: adenine (A), cytosine (C), thymine (T), and guanine (G). Amino acids are likewise indicated by the following standard abbreviations: alanine (Ala; A), arginine (Arg; R), asparagine (Asn; N), aspartic acid (Asp; D), cysteine (Cys; C), glutamine (Gln; Q),glutamic acid (Glu; E), glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu; L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F), proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp; W), tyrosine (Tyr; Y), and valine (Val; V). Furthermore, (Xaa; X) represents any amino acid.
Wet milling is a process of separating the starch, protein and oil components of grain, most often cereals, for example corn. It is distinguished herein from dry milling, which is simply pulverizing grain. Corn wet milling is comprised of the steps of steeping, grinding the corn kernel and separating the components of the kernel. The first step in wet milling is usually steeping, wherein the grain is soaked in water under carefully controlled conditions to soften the kernels and facilitate separation of the components. The kernels are typically steeped in a steep tank with a countercurrent flow of water at about 120xc2x0 F. containing sulfur dioxide at a concentration of about 0.2% by weight. The kernels remain in the steep tank from about 24 to 48 hours. The kernels are then dewatered and subjected to sets of attrition type mills. The first set of attrition type mills rupture the kernels releasing the germ, and corn oil from the rest of the kernel. Centrifugation is used to separate the germ from the rest of the kernel. The oil-bearing embryos float to the surface of the aqueous solution and are removed.
Next, by processes of watering and dewatering, milling, screening, centrifuging and washing, the starch is separated from the protein and purified. Following embryo removal, the remaining kernel components including the starch, hull, fiber, and gluten are subjected to another set of attrition mills and passed through a set of wash screens to separate the fiber components from the starch and gluten. The starch and gluten pass through the screens while the fiber does not. Centrifugation or a third grind followed by centrifugation is used to separate the starch from the protein. Centrifugation produces a slurry containing the starch granules, which is dewatered, washed with fresh water and dried to about 12% moisture. The result is the recovery of a fraction of substantially pure starch from the corn kernels in this manner.
The key difficulty is to loosen starch granules from the complicated matrix of proteins and cell wall material that makes up the endosperm of the grain. One reason for this difficulty is believed to be the presence of inter- or intra-molecular disulfide bonds, which render the protein matrix less soluble and less susceptible to proteolytic enzymes and inhibit release of the starch granules from the protein matrix in the grain. At present, the primary means for reducing these bonds is to steep the grain in the presence of sulfur dioxide, but this is costly, environmentally unfriendly, and not optimally effective. Because the steep water contains sulfur dioxide it is considered toxic waste, and therefore minimizing the volume generated would be advantageous. Alternatively, the requirement for sulfur dioxide would be eliminated. Reducing the steep times that are required for grain conditioning prior to milling is an additional advantage of reducing the extent of disulfide bonds in the endosperm matrix.
Certain mutations exert beneficial effects on the protein matrix of corn kernel endosperm (floury and opaque), but impair kernel integrity. Transgenic thioredoxin expression provides some of these advantages without creating some of the kernel integrity problems associated with these mutations.
Post-harvest or processing-dependent activities of thioredoxin have equally beneficial effects. For example, in one embodiment, thioredoxin and/or thioredoxin reductase enzymes are targeted to and accumulated in cell compartments. Protein reduction occurs following physical disruption of the seed. In another embodiment, quiescent endosperm thioredoxin and/or thioredoxin reductase is activated upon steeping. In a preferred embodiment, the invention provides a plant expressing a transgenic thermostable thioredoxin and thioredoxin reductase, e.g. a thioredoxin and thioredoxin reductase derived from a hyperthermophilic organism, such that the thioredoxin and thioredoxin reductase are not significantly active except at high temperatures (e.g., greater than 50xc2x0 C.). In one embodiment, the thermnostable thioredoxin and thioredoxin reductase are synergistic with saccharification via expression of other thermostable enzymes in endosperm.
Expression of transgenic thioredoxin and/or thioredoxin reductase in grain is also useful to improve grain characteristics associated with digestibility, particularly in animal feeds. Susceptibility of feed proteins to proteases is a function of time and of protein conformation. Kernel cracking is often used in feed formulation as is steam flaking. Both of these processes are designed to aid kernel digestibility. Softer kernels whose integrity can be disrupted more easily in animal stomachs are desirable. Conformational constraints and crosslinks between proteins are major determinants of protease susceptibility. Modifying these bonds by increased thioredoxin and/or thioredoxin reductase expression thereby aids digestion.
Protein content and quality are important determinants in flaking grit production and in masa production. Reduction of disulphide bonds alters the nature of corn flour such that it is suitable for use as a wheat substitute, especially flours made from high-protein white corn varieties.
Over half of the US soybean crop is crushed or milled, and the protein quality in the resulting low-fat soy flour or de-fatted soy flour (or soybean meal) is important for subsequent processing. Protein yield and quality from soybean processing streams are economically important, and are largely dependent upon protein conformation. Increasing thioredoxin activity through expression of transgenic thioredoxin and/or thioredoxin reductase increases protein solubility, and thus increases yield, in the water-soluble protein fractions. Recovery is facilitated by aqueous extraction of de-fatted soybean meal under basic conditions. Enhancing thioredoxin activity through expression of transgenic thioredoxin and/or thioredoxin reductase also reduces the required pH for efficient extraction and thereby reduces calcium or sodium hydroxide inputs, as well as lowering the acid input for subsequent acid precipitation, allowing efficient recovery of proteins without alkali damage, and reducing water consumption and processing plant waste effluents (that contain substantial biological oxygen demand loads).
Protein redox status affects important functional properties supplied by soy proteins, such as solubility, water absorption, viscosity, cohesion/adhesion, gelation and elasticity. Fiber removal during soy protein concentrate production and soy protein isolate hydrolysis by proteases is enhanced by increasing thioredoxin activity as described herein. Similarly, as described for corn above, increasing thioredoxin activity through expression of transgenic thioredoxin and/or thioredoxin reductase enhances the functionality of enzyme-active soy flours and the digestibility of the soybean meal fraction and steam flaking fraction in animal feeds.
Modification of protein quality during seed development and during processing are both provided, although it is preferred that the transgenic thioredoxin and/or thioredoxin reductase be targeted to a cell compartment and be thermostable, as described above, to avoid significant adverse effects on storage protein accumulation possibly encountered as a result of thioredoxin activity during seed development. Alternately, the thioredoxin may be added as a processing enzyme, as (in contrast to corn wet milling) breaking the disulphide bonds is not necessary until after grain integrity is destroyed (crushing and oil extraction).
Thioredoxin, thioredoxin reductase and protein disulfide isomerase (PDI) genes are found in eukaryotes, eubacteria as well as archea, including hyperthermophilic organisms such as Methanococcus jannaschii and Archaeoglobus fulgidus. Selection of a particular gene depends in part on the desired application. For the methods of the present invention, preferred thioredoxins have the following characteristics:
1. Heat stabilityxe2x80x94Thioredoxin and related proteins from hyperthermophiles are found to have increased stability at high temperatures ( greater than 50xc2x0 C.) and relatively low activity at ambient temperatures. Expression of thioredoxin and/or thioredoxin reductase from hyperthermophiles, for example from archea such as Methanococcus jannaschii and Archaeoglobus fulgidus or other hyperthermophiles is preferred for expression during seed development, so that the thioredoxin activity is not markedly increased until the grain is steeped or processed at elevated temperature. Most grain processing methods involve, or are compatible with, a high temperature step. Thermostable thioredoxin and thioredoxin reductase are therefore preferred. By thermostable is meant that the enzyme is preferentially active at high temperatures, e.g., temperatures greater than 40xc2x0 C., most preferably greater than 50xc2x0 C., e.g. 45-60xc2x0 C. for wet milling, or even higher, e.g., 45-95xc2x0 C.
2. Substrate specificityxe2x80x94It is also possible to reduce undesirable effects on seed development by selection of a thioredoxin that acts preferentially on certain proteins such as the structural protein in the matrix and has low activity with essential metabolic enzymes. Various thioredoxins have been shown to differ in reactivity with enzymes that are under redox control. Thus it is possible to select a thioredoxin that will primarily act on the desired targets, minimizing undesirable side-effects of over expression.
Suitable thermostable thioredoxins and thioredoxin reductases include the following:
amino acid sequence of thioredoxin from Methanococcus jannaschii (gi|1591029) MSKVKIELFTSPMCPHCPAAKRVVEEVANEMPDAVEVEYINVMENPQKAMEYGIMA VPTIVINGDVEFIGAPTKEALVEAIKKRL (SEQ ID NO:1);
amino acid sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649903)(trx-1) MPMVRKAAFYAIAVISGVLAAVVGNALYHNFNSDLGAQAKIYFFYSDSCPHCREVKP YVEEFAKTHNLTWCNVAEMDANCSKIAQEFGIKYVPTLVIMDEEAHVFVGSDEVRTA IEGMK (SEQ ID NO:2);
amino acid sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649838)(trx-2) MVFTSKYCPYCRAFEKVVERLMGELNGTVEFEVVDVDEKRELAEKYEVLML PTLVLADGDEVLGGFMGFADYKTAREAILEQISAFLKPDYKN (SEQ ID NO:3);
amino acid sequence of thioredoxin from Archaeoglobus fulgidus (gi|2649295)(trx-3) MDELELIRQKKLKEMMQKMSGEEKARKVLDSPVKLNSSNFDETLKNNENVVVDFW AEWCMPCKMIAPVEELAKEYAGKVVFGKLNTDENPTIAARYGISAIPTLIFFKKGKPV DQLVGAMPKSELKRWVQRNL (SEQ ID NO:4);
amino acid sequence of thioredoxin from Archaeoglobus fulgidus (gi|2648389)(trx-4) MERLNSERFREVIQSDKLVVVDFYADWCMPCRYISPELEKLSKEYNGEVEFYKLNVDE NQDVAFEYGIAS IPTVLFFRNGKVVGGFIGAMPESAVRAEIEKALGA (SEQ ID NO:5);
amino acid sequence of thioredoxin reductase (trxB) from Methanococcus jannaschii (gi|1592167)
MIHDTIIIGAGPGGLTAGIYAMRGKLNALCIEKENAGGRIAEAGIVENYPGFEEI RGYELAEKFKNHAEKFKLPIIYDEVIKIETKERPFKVITKNSEYLTKTIVIATGTKPKKL GLNEDKFIGRGISYCTMCDAFFYLNKEVIVIGRDTPAIMSAINLKDIAKKVIVITDKSEL KAAESIMLDKLKEANNVEIIYNAKPLEIVGEERAEGVKISVNGKEEIIKADGIRSLGHV PNTEFLKDSGELDKKGFIKTDENCRTNIDGIYAVGDVRGGVMQVAKAVGDGCVAM ANIIKYLQKL (SEQ ID NO:6); and
amino acid sequence of thioredoxin reductase from Archaeoglobus fulgidus (gi|2649006)(trxB) MYDVAIIGGGPAGLTAALYSARYGLKTVFFETVDPVSQLSLAAKIENYPGFEGSGMEL LEKMKEQAVKAGAEWKLEKVERVERNGETFTVIAEGGEYEAKAIIVATGGKHKEAGI EGESAFIGRGVSYCATCDGNFFRGKKVIVYGSGKEAIEDAIYLHDIGCEVTIVSRTPSFR AEKALVEEVEKRGIPVEIYSTTIRKIIGSGKVEKVVAYNREKKEEFEIEADGIFVAIGMR PATDVVAELGVERDSMGYIKVDKEQRTNVEGVFAAGDCCDNPLKQVVTACGDGAV AAYSAYKYLTS (SEQ ID NO:7).
The genes that encode these proteins for use in the present invention are preferably designed by back-translation using plant preferred codons, to enhance Gxe2x88x92C content and remove detrimental sequences, as more fully described below. The activity of the proteins may be enhanced by DNA shuffling or other means, as described below. The invention therefore comprises proteins derived from these proteins, especially proteins which are substantially similar which retain thioredoxin or thioredoxin reductase activity.
For engineering thioredoxin expression in seeds for activity during grain development, promoters which direct seed-specific expression of thioredoxin and thioredoxin reductase are preferred, as is targeting to the storage so that the enzyme will have the desired effects on storage proteins, which may be desirable in some applications. In the present invention, however, it is more generally desirable to engineer thioredoxin and/or thioredoxin reductase expression in seeds for accumulation and inactivity during grain development. Several strategies are employed to create seeds that express transgenic thioredoxin and/or thioredoxin reductase without having a significant impact on normal seed development, e.g.:
(i) To compartmentalize active thioredoxin or thioredoxin reductase such that it does not significantly interact with the target proteins, for example by targeting to or expression in amyloplasts. Plastid targeting sequences are used to direct accumulation in the amyloplast. Alternatively, the thioredoxin and/or thioredoxin reductase is targeted to an extracellular location in cell walls using secretion signals. Or finally, in the case of monocots,:expression in cell types such as aleurone during seed development is used to keep the thioredoxin and/or thioredoxin reductase away from the storage components of the rest of the endosperm.
(ii) To engineer the expression of thioredoxin and/or thioredoxin reductase from thermophilic organisms. Enzymes which have little or no activity at ambient temperatures (as high as 38-39xc2x0 C. in the field) are less likely less likely to cause problems during development. Preferably, therefore, the enzymes are active primarily at high temperatures, e.g., temperatures greater than 40xc2x0 C., most preferably 45-60xc2x0 C. for wet milling, or even higher, e.g., 45-95xc2x0 C.
(iii) To place the thioredoxin and/or thioredoxin reductase under control of an inducible promoter, for example a chemically-inducible promoter, a wound inducible promoter, or a transactivator mediated promotor which is activated upon pollination by a plant expressing the transactivator.
(iv) To utilize thioredoxin having specific requirements for a particular thioredoxin reductase, such that activity of the thioredoxin or thioredoxin reductase is suitably regulated via availability of the appropriate thioredoxin reductase or thioredoxin, respectively. For example, the thioredoxin and thioredoxin reductase are expressed in different plants, so that the active combination is only available in the seed upon pollination by the plant expressing the complimentary enzyme. Alternatively, the thioredoxin or thioredoxin reductase is sequestered in the cell, for example in a plastid, vacuole, or apoplast, as described above, so that it does not become available until the grain is processed.
The invention thus provides a novel method of enhancing separation of the starch from the protein matrix, using thioredoxin and/or thioredoxin reductase. In a first embodiment, thioredoxin activity is found to be useful in a variety of seed processing applications, including wet milling, dry milling, oilseed processing, soybean processing, wheat processing and flour/dough quality, most especially the wet milling of grains, in particular corn.
Accordingly, the invention provides a method to improve milling efficiency or increase milling yield, to increase efficiency of separation of starch and protein, to enhance yields of starch and soluble proteins from grain, or to enhance increase protein solubility in water or other solvents, comprising steeping grain in the presence of supplemental thioredoxin and/or thioredoxin and separating the starch and protein components of the grain. Typically, steeping occurs before milling, but may occur afterwards, and there may be more than one milling or steeping step in the process method extraction and increase protein yield from seeds during the steep or points after steeping. Preferably, the supplemental thioredoxin and/or thioredoxin reductase is provided by expression of a transgene in the plant from which the grain is harvested.
The invention further provides: the use of thioredoxin or thioredoxin reductase in a method to improve milling efficiency or increase milling yield of starches or proteins, for example in any of the methods described above, steepwater comprising an amount of thioredoxin and/or thioredoxin reductase effective to facilitate separation of starch from protein in grain; grain which has been exposed to thioredoxin an amount effective to facilitate separation of starch from protein; and starch or protein which has been produced by the method described above.
The activity of the thioredoxin in the above method may be enhanced by supplementing the steepwater with thioredoxin reductase and/or NADPH. Other components normally present in steepwater for wet milling may also be present, such as bacteria which produce lactic acid. Preferably, the steeping is carried out at a temperature of about 52xc2x0 C. for a period of 22-50 hours, so it is desirable that the thioredoxin is stable under these conditions.
The grain may be a dicotyledonous seed, for example, an oil seed, e.g., soybean, sunflower or canola, preferably soybean; or may be a monocotyledonous seed, for example a cereal seed, e.g., corn, wheat, oats, barley, rye or rice, most preferably corn.
The thioredoxin may be any protein bearing thiol groups which can be reversibly oxidized to form disulfide bonds and reduced by NAPDH in the presence of a thioredoxin reductase. Preferably the thioredoxin is derived from a thermophilic organism, as described above.
Thioredoxin and/or thioredoxin reductase for use in the instant invention is suitably produced in an engineered microbe, e.g. a yeast or aspergilles, or in an engineered plant capable of very high expression, e.g. in barley, e.g., under control of a promoter active during malting, such as a high pI alpha-amylase promoter or other gibberellin dependent promoter. The thioredoxin (in excreted or extracted form or in combination with the producer organism or parts thereof) is then added to the steepwater.
As an alternative or supplement to adding the thioredoxin to the steepwater, the enzyme can be expressed directly in the seed that is to be milled. Preferably, the enzyme is expressed during grain maturation or during a conditioning process.
Accordingly, in a further embodiment, the invention provides a method of making thioredoxin on an industrial scale in a transgenic organism, e.g., a plant, e.g., a cereal, such as barley or corn, or a microorganism, e.g., a yeast or aspergillis, for example a method comprising the steps of cultivating a transgenic organism having a chimeric gene which expresses thioredoxin, and optionally isolating or extracting the thioredoxin;
A method of using transgenic plants that produce elevated quantities of thioredoxin during seed maturation or germination such that the quality of the proteins in that seed are affected by the endogenously synthesized thioredoxin during seed development, or during the steeping process, thereby eliminating or reducing the need for conditioning with exogenous chemicals or enzymes prior to milling;
A method of making transgenic plants that produce elevated quantities of thioredoxin during seed maturation or germination such that the quality of the proteins in that seed are affected by the thioredoxin during seed development or during the steeping process, thereby eliminating or reducing the need for conditioning with exogenous chemicals or enzymes prior to milling.
A method for milling grain that uses transgenic seed containing thioredoxin, that results in higher starch and soluble protein yields.
The invention further comprises a transgenic organism having in its genome a chimeric expression cassette comprising a coding region encoding a thermostable thioredoxin or thioredoxin reductase under operative control of a promoter.
Preferably, the transgenic organism is a plant which expresses a thioredoxin and/or thioredoxin reductase in a form not naturally occurring in plants of that species or which expresses thioredoxin at higher levels than naturally occur in a plant of that species. Preferably, the thioredoxin is expressed in the seed during seed development, and is therefore preferably under control of a seed specific promoter. Optionally, expression of the thioredoxin is placed under control of an inducible or transactivator-regulated promoter, so that expression is activated by chemical induction or hybridization with a transactivator when desired. The thioredoxin is suitably targeted to the vacuoles of the plant by fusion with a vacuole targeting sequence.
In the present invention, thioredoxin coding sequences are fused to promoters active in plants and transformed into the nuclear genome or the plastid genome. The promoter is preferably a seed specific promoter such as the gamma-zein promoter. The promoter may alternatively be a chemically-inducible promoter such as the tobacco PR-1a promoter; or may be a chemically induced transactivator regulated promoter wherein the transactivator is under control of a chemically induced promoter; however, in certain situations, constitutive promoters such as the CaMV 35S or Gelvin promoter may be used. With a chemically inducible promoter, expression of the thioredoxin genes transformed into plants may be activated at an appropriate time by foliar application of a chemical inducer.
Alternatively, the thioredoxin coding sequence is under control of a transactivator regulated promoter, and expression is achieved by crossing the plant transformed with this sequence with a second plant expressing the transactivator. In a preferred form of this method, the first plant containing the thioredoxin coding sequence is the seed parent and is male sterile, while the second plant expressing the transactivator is the pollinator. Expression of thioredoxin in seeds is achieved by interplanting the first and second plants, e.g., such that the first plant is pollinated by the second and thioredoxin is expressed in the seeds of the first plant by activation of the transactivator regulated promoter with the transactivator expressed by the transactivator gene from the second parent.
The nucleic acid sequences described in this application can be incorporated into plant cells using conventional recombinant DNA technology. Generally, this involves inserting a coding sequence of the invention into an expression system to which the coding sequence is heterologous (i.e., not normally present) using standard cloning procedures known in the art. The vector contains the necessary elements for the transcription and translation of the inserted protein-coding sequences. A large number of vector systems known in the art can be used, such as plasmids, bacteriophage viruses and other modified viruses. Suitable vectors include, but are not limited to, viral vectors such as lambda vector systems xcexgt11, xcexgt10 and Charon 4; plasmid vectors such as pBI121, pBR322, pACYC177, pACYC184, pAR series, pKK223-3, pUC8, pUC9, pUC18, pUC19, pLG339, pRK290, pKC37, pKC101, pCDNAII; and other similar systems. The components of the expression system may also be modified to increase expression. For example, truncated sequences, nucleotide substitutions or other modifications may be employed. The expression systems described herein can be used to transform virtually any crop plant cell under suitable conditions. Transformed cells can be regenerated into whole plants such that the nucleotide sequence of the invention is expressed in the transgenic plants.
The transgenic expression in plants of genes derived from microbial sources may require the modification of those genes to achieve and optimize their expression in plants. In particular, bacterial ORFs which encode separate enzymes but which are encoded by the same transcript in the native microbe are best expressed in plants on separate transcripts. To achieve this, each microbial ORF is isolated individually and cloned within a cassette which provides a plant promoter sequence at the 5xe2x80x2 end of the ORF and a plant transcriptional terminator at the 3xe2x80x2 end of the ORF. The isolated ORF sequence preferably includes the initiating ATG codon and the terminating STOP codon but may include additional sequence beyond the initiating ATG and the STOP codon. In addition, the ORF may be truncated, but still retain the required activity; for particularly long ORFs, truncated versions which retain activity may be preferable for expression in transgenic organisms. By xe2x80x9cplant promoterxe2x80x9d and xe2x80x9cplant transcriptional terminatorxe2x80x9d it is intended to mean promoters and transcriptional terminators which operate within plant cells. This includes promoters and transcription terminators which may be derived from non-plant sources such as viruses (an example is the Cauliflower Mosaic Virus).
In some cases, modification to the ORF coding sequences and adjacent sequence is not required. It is sufficient to isolate a fragment containing the ORF of interest and to insert it downstream of a plant promoter. For example, Gaffney et al. (Science 261: 754-756 (1993)) have expressed the Pseudonionas nahG gene in transgenic plants under the control of the CaMV 35S promoter and the CaMV tml terminator successfully without modification of the coding sequence and with x bp of the Pseudomonas gene upstream of the ATG still attached, and y bp downstream of the STOP codon still attached to the nahG ORF. Preferably as little adjacent microbial sequence should be left attached upstream of the ATG and downstream of the STOP codon. In practice, such construction may depend on the availability of restriction sites.
In other cases, the expression of genes derived from microbial sources may provide problems in expression. These problems have been well characterized in the art and are particularly common with genes derived from certain sources such as Bacillus. These problems may apply to the nucleotide sequence of this invention and the modification of these genes can be undertaken using techniques now well known in the art. The following problems may be encountered:
1. Codon Usage.
The preferred codon usage in plants differs from the preferred codon usage in certain microorganisms. Comparison of the usage of codons within a cloned microbial ORF to usage in plant genes (and in particular genes from the target plant) will enable an identification of the codons within the ORF which should preferably be changed. Typically plant evolution has tended towards a strong preference of the nucleotides C and G in the third base position of monocotyledons, whereas dicotyledons often use the nucleotides A or T at this position. By modifying a gene to incorporate preferred codon usage for a particular target transgenic species, many of the problems described below for GC/AT content and illegitimate splicing will be overcome.
2. GC/AT Content.
Plant genes typically have a GC content of more than 35%. ORF sequences which are rich in A and T nucleotides can cause several problems in plants. Firstly, motifs of ATTTA are believed to cause destabilization of messages and are found at the 3xe2x80x2 end of many short-lived mRNAs. Secondly, the occurrence of polyadenylation signals such as AATAAA at inappropriate positions within the message is believed to cause premature truncation of transcription. In addition, monocotyledons may recognize AT-rich sequences as splice sites (see below).
3. Sequences Adjacent to the Initiating Methionine.
Plants differ from microorganisms in that their messages do not possess a defined ribosome binding site. Rather, it is believed that ribosomes attach to the 5xe2x80x2 end of the message and scan for the first available ATG at which to start translation. Nevertheless, it is believed that there is a preference for certain nucleotides adjacent to the ATG and that expression of microbial genes can be enhanced by the inclusion of a eukaryotic consensus translation initiator at the ATG. Clontech (1993/1994 catalog, page 210, incorporated herein by reference) have suggested one sequence as a consensus translation initiator for the expression of the E. coli uida gene in plants. Further, Joshi (NAR 15: 6643-6653 (1987), incorporated herein by reference) has compared many plant sequences adjacent to the ATG and suggests another consensus sequence. In situations where difficulties are encountered in the expression of microbial ORFs in plants, inclusion of one of these sequences at the initiating ATG may improve translation. In such cases the last three nucleotides of the consensus may not be appropriate for inclusion in the modified sequence due to their modification of the second AA residue. Preferred sequences adjacent to the initiating methionine may differ between different plant species. A survey of 14 maize genes located in the GenBank database provided the following results:
This analysis can be done for the desired plant species into which the nucleotide sequence is being incorporated, and the sequence adjacent to the ATG modified to incorporate the preferred nucleotides.
4. Removal of Illegitimate Splice Sites.
Genes cloned from non-plant sources and not optimized for expression in plants may also contain motifs which may be recognized in plants as 5xe2x80x2 or 3xe2x80x2 splice sites, and be cleaved, thus generating truncated or deleted messages. These sites can be removed using the techniques well known in the art.
Techniques for the modification of coding sequences and adjacent sequences are well known in the art. In cases where the initial expression of a microbial ORF is low and it is deemed appropriate to make alterations to the sequence as described above, then the construction of synthetic genes can be accomplished according to methods well known in the art. These are, for example, described in the published patent disclosures EP 0 385 962 (to Monsanto), EP 0 359 472 (to Lubrizol) and WO 93/07278 (to Ciba-Geigy), all of which are incorporated herein by reference. In most cases it is preferable to assay the expression of gene constructions using transient assay protocols (which are well known in the art) prior to their transfer to transgenic plants.
Coding sequences intended for expression in transgenic plants are first assembled in expression cassettes behind a suitable promoter expressible in plants. The expression cassettes may also comprise any further sequences required or selected for the expression of the transgene. Such sequences include, but are not restricted to, transcription terminators, extraneous sequences to enhance expression such as introns, vital sequences, and: sequences intended for the targeting of the gene product to specific organelles and cell compartments. These expression cassettes can then be easily transferred to the plant transformation vectors described below. The following is a description of various components of typical expression cassettes.
1. Promoters
The selection of the promoter used in expression cassettes will determine the spatial and temporal expression pattern of the transgene in the transgenic plant. Selected promoters will express transgenes in specific cell types (such as leaf epidermal cells, mesophyll cells, root cortex cells) or in specific tissues or organs (roots, leaves or flowers, for example) and the selection will reflect the desired location of accumulation of the gene product. Alternatively, the selected promoter may drive expression of the gene under various inducing conditions. Promoters vary in their strength, i.e., ability to promote transcription. Depending upon the host cell system utilized, any one of a number of suitable promoters can be used, including the gene""s native promoter. The following are non-limiting examples of promoters that may be used in expression cassettes.
a. Constitutive Expression, the Ubiquitin Promoter:
Ubiquitin is a gene product known to accumulate in many cell types and its promoter has been cloned from several species for use in transgenic plants (e.g. sunflowerxe2x80x94Binet et al. Plant Science 79: 87-94 (1991); maizexe2x80x94Christensen et al. Plant Molec. Biol. 12: 619-632 (1989); and Arabidopsisxe2x80x94Norris et al., Plant Mol. Biol. 21:895-906 (1993)). The maize ubiquitin promoter has been developed in transgenic monocot systems and its sequence and vectors constructed for monocot transformation are disclosed in the patent publication EP 0 342 926 (to Lubrizol) which is herein incorporated by reference. Taylor et al. (Plant Cell Rep. 12: 491-495 (1993)) describe a vector (pAHC25) that comprises the maize ubiquitin promoter and first intron and its high activity in cell suspensions of numerous monocotyledons when introduced via microprojectile bombardment. The Arabidopsis ubiquitin promoter is ideal for use with the nucleotide sequences of the present invention. The ubiquitin promoter is suitable for gene expression in transgenic plants, both monocotyledons and dicotyledons. Suitable vectors are derivatives of pAHC25 or any of the transformation vectors described in this application, modified by the introduction of the appropriate ubiquitin promoter and/or intron sequences.
b. Constitutive Expression, the CaMV 35S Promoter:
Construction of the plasmid pCGN1761 is described in the published patent application EP 0 392 225 (Example 23), which is hereby incorporated by reference. pCGN1761 contains the xe2x80x9cdoublexe2x80x9d CaMV 35S promoter and the tml transcriptional terminator with a unique EcoRI site between the promoter and the terminator and has a pUC-type backbone. A derivative of pCGN1761 is constructed which has a modified polylinker which includes NotI and XhoI sites in addition to the existing EcoRI site. This derivative is designated pCGN1761ENX. pCGN1761ENX is useful for the cloning of cDNA sequences or coding sequences (including microbial ORF sequences) within its polylinker for the purpose of their expression under the control of the 35S promoter in transgenic plants. The entire 35S promoter-coding sequence-tml terminator cassette of such a construction can be excised by HindIII, SphI, SalI, and XbaI sites 5xe2x80x2 to the promoter and XbaI, BamHI and BglI sites 3xe2x80x2 to the terminator for transfer to transformation vectors such as those described below. Furthermore, the double 35S promoter fragment can be removed by 5xe2x80x2 excision with HindIII, SphI, SalI, XbaI, or Pstl, and 3xe2x80x2 excision with any of the polylinker restriction sites (EcoRI, NotI or XhoI) for replacement with another promoter. If desired, modifications around the cloning sites can be made by the introduction of sequences that may enhance translation. This is particularly useful when overexpression is desired. For example, pCGN1761ENX may be modified by optimization of the translational initiation site as described in Example 37 of U.S. Pat. No. 5,639,949, incorporated herein by reference.
c. Constitutive Expression, the Actin Promoter:
Several isoforms of actin are known to be expressed in most cell types and consequently the actin promoter is a good choice for a constitutive promoter. In particular, the promoter from the rice Actl gene has been cloned and characterized (McElroy et al. Plant Cell 2: 163-171 (1990)). A 1.3 kb fragment of the promoter was found to contain all the regulatory elements required for expression in rice protoplasts. Furthermore, numerous expression vectors based on the ActI promoter have been constructed specifically for use in monocotyledons (McElroy et al. Mol. Gen. Genet. 231: 150-160 (1991)). These incorporate the ActI-intron 1, AdhI 5xe2x80x2 flanking sequence and AdhI-intron 1 (from the maize alcohol dehydrogenase gene) and sequence from the CaMV 35S promoter. Vectors showing highest expression were fusions of 35S and ActI intron or the ActI 5xe2x80x2 flanking sequence and the ActI intron. Optimization of sequences around the initiating ATG (of the GUS reporter gene) also enhanced expression. The promoter expression cassettes described by McElroy et al. (Mol. Gen. Genet. 231: 150-160 (1991)) can be easily modified for gene expression and are particularly suitable for use in monocotyledonous hosts. For example, promoter-containing fragments is removed from the McElroy constructions and used to replace the double 35S promoter in pCGN1761ENX, which is then available for the insertion of specific gene sequences. The fusion genes thus constructed can then be transferred to appropriate transformation vectors. In a separate report, the rice ActI promoter with its first intron has also been found to direct high expression in cultured barley cells (Chibbar et al. Plant Cell Rep. 12: 506-509 (1993)).
d. Inducible Expression, the PR-1 Promoter:
The double 35S promoter in pCGN1761ENX may be replaced with any other promoter of choice that will result in suitably high expression levels. By way of example, one of the chemically regulatable promoters described in U.S. Pat. No. 5,614,395 may replace the double 35S promoter. The promoter of choice is preferably excised from its source by restriction enzymes, but can alternatively be PCR-amplified using primers that carry appropriate terminal restriction sites. Should PCR-amplification be undertaken, then the promoter should be re-sequenced to check for amplification errors after the cloning of the amplified promoter in the target vector. The chemically/pathogen regulatable tobacco PR-1a promoter is cleaved from plasmid pCIB 1004 (for construction, see example 21 of EP 0 332 104, which is hereby incorporated by reference) and transferred to plasmid pCGN1761ENX (Uknes et al., 1992). pCIB1004 is cleaved with NcoI and the resultant 3xe2x80x2 overhang of the linearized fragment is rendered blunt by treatment with T4 DNA polymerase. The fragment is then cleaved with HindIll and the resultant PR-1a promoter-containing fragment is gel purified and cloned into pCGN1761ENX from which the double 35S promoter has been removed. This is done by cleavage with XhoI and blunting with T4 polymerase, followed by cleavage with HindIII and isolation of the larger vector-terminator containing fragment into which the pCIB 1004 promoter fragment is cloned. This generates a pCGN1761ENX derivative with the PR-1a promoter and the tml terminator and an intervening polylinker with unique EcoRI and NotI sites. The selected coding sequence can be inserted into this vector, and the fusion products (i.e. promoter-gene-terminator) can subsequently be transferred to any selected transformation vector, including those described infra. Various chemical regulators may be employed to induce expression of the selected coding sequence in the plants transformed according to the present invention, including the benzothiadiazole, isonicotinic acid, and salicylic acid compounds disclosed in U.S. Pat. Nos. 5,523,311 and 5,614,395.
e. Inducible Expression, an Ethanol-Inducible Promoter:
A promoter inducible by certain alcohols or ketones, such as ethanol, may also be used to confer inducible expression of a coding sequence of the present invention. Such a promoter is for example the alcA gene promoter from Aspergillus nidulans (Caddick et al. (1998) Nat. Biotechnol 16:177-180). In A. nidulans, the alcA gene encodes alcohol dehydrogenase I, the expression of which is regulated by the AlcR transcription factors in presence of the chemical inducer. For the purposes of the present invention, the CAT coding sequences in plasmid palcA:CAT comprising a alcA gene promoter sequence fused to a minimal 35S promoter (Caddick et al. (1998) Nat. Biotechnol 16:177-180) are replaced by a coding sequence of the present invention to form an expression cassette having the coding sequence under the control of the alcA gene promoter. This is carried out using methods well known in the art.
f. Inducible Expression, a Glucocorticoid-Inducible Promoter:
Induction of expression of a nucleic acid sequence of the present invention using systems based on steroid hormones is also contemplated. For example, a glucocorticoid-mediated induction system is used (Aoyama and Chua (1997) The Plant Journal 11: 605-612) and gene expression is induced by application of a glucocorticoid, for example a synthetic glucocorticoid, preferably dexamethasone, preferably at a concentration ranging from 0.1 mM to 1 mM, more preferably from 10 mM to 100 mM. For the purposes of the present invention, the luciferase gene sequences are replaced by a nucleic acid sequence of the invention to form an expression cassette having a nucleic acid sequence of the invention under the control of six copies of the GAL4 upstream activating sequences fused to the 35S minimal promoter. This is carried out using methods well known in the art. The trans-acting factor comprises the GAL4 DNA-binding domain (Keegan et al. (1986) Science 231: 699-704) fused to the transactivating domain of the herpes viral protein VP16 (Triezenberg et al. (1988) Genes Devel. 2: 718-729) fused to the hormone-binding domain of the rat glucocorticoid receptor (Picard et al. (1988) Cell 54: 1073-1080). The expression of the fusion protein is controlled by any promoter suitable for expression in plants known in the art or described here. This expression cassette is also comprised in the plant comprising a nucleic acid sequence of the invention fused to the 6xc3x97GAL4/minimal promoter. Thus, tissue- or organ-specificity of the fusion protein is achieved leading to inducible tissue- or organ-specificity of the insecticidal toxin.
g. Root Specific Expression:
Another pattern of gene expression is root expression. A suitable root promoter is described by de Framond (FEBS 290: 103-106 (1991)) and also in the published patent application EP 0 452 269, which is herein incorporated by reference. This promoter is transferred to a suitable vector such as pCGN1761ENX for the insertion of a selected gene and subsequent transfer of the entire promoter-gene-terminator cassette to a transformation vector of interest.
h. Wound-Inducible Promoters:
Wound-inducible promoters may also be suitable for gene expression. Numerous such promoters have been described (e.g. Xu et al. Plant Molec. Biol. 22: 573-588 (1993), Logemann et al. Plant Cell 1: 151-158 (1989), Rohrmeier and Lehle, Plant Molec. Biol. 22: 783-792 (1993), Firek et al. Plant Molec. Biol. 22: 129-142 (1993), Warner et al. Plant J. 3: 191-201 (1993)) and all are suitable for use with the instant invention. Logemann et al. describe the 5xe2x80x2 upstream sequences of the dicotyledonous potato wuni gene. Xu et al. show that a wound-inducible promoter from the dicotyledon potato (pin2) is active in the monocotyledon rice. Further, Rohrmeier and Lehle describe the cloning of the maize WipI cDNA which is wound induced and which can be used to isolate the cognate promoter using standard techniques. Similar, Firek et al. and Warner et al. have described a wound-induced gene from the monocotyledon Asparagus officinalis, which is expressed at local wound and pathogen invasion sites. Using cloning techniques well known in the art, these promoters can be transferred to suitable vectors, fused to the genes pertaining to this invention, and used to express these genes at the sites of plant wounding.
Pith-Preferred Expression:
Patent Application WO 93/07278, which is herein incorporated by reference, describes the isolation of the maize trpA gene, which is preferentially expressed in pith cells.: The gene sequence and promoter extending up to xe2x88x921726 bp from the start of transcription are presented. Using standard molecular biological techniques, this promoter, or parts thereof, can be transferred to a vector such as pCGN1761 where it can replace the 35S promoter and be used to drive the expression of a foreign gene in a pith-preferred manner. In fact, fragments containing the pith-preferred promoter or parts thereof can be transferred to any vector and modified for utility in transgenic plants.
j. Leaf-Specific Expression:
A maize gene encoding phosphoenol carboxylase (PEPC) has been described by Hudspeth and Grula (Plant Molec Biol 12: 579-589 (1989)). Using standard molecular biological techniques the promoter for this gene can be used to drive the expression of any gene in a leaf-specific manner in transgenic plants.
k. Pollen-Specific Expression:
WO 93/07278 describes the isolation of the maize calcium-dependent protein kinase (CDPK) gene which is expressed in pollen cells. The gene sequence and promoter extend up to 1400 bp from the start of transcription. Using standard molecular biological techniques, this promoter or parts thereof, can be transferred to a vector such as pCGN1761 where it can replace the 35S promoter and be used to drive the expression of a nucleic acid sequence of the invention in a pollen-specific manner.
2. Transcriptional Terminators
A variety of transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and its correct polyadenylation. Appropriate transcriptional terminators are those that are known to function in plants and include the CaMV 35S terminator, the tml terminator, the nopaline synthase terminator and the pea rbcS E9 terminator. These can be used in both monocotyledons and dicotyledons. In addition, a gene""s native transcription terminator may be used.
3. Sequences for the Enhancement or Regulation of Expression
Numerous sequences have been found to enhance gene expression from within the transcriptional unit and these sequences can be used in conjunction with the genes of this invention to increase their expression in transgenic plants.
Various intron sequences have been shown to enhance expression, particularly in monocotyledonous cells. For example, the introns of the maize Adhl gene have been found to significantly enhance the expression of the wild-type gene under its cognate promoter when introduced into maize cells. Intron I was found to be particularly effective and enhanced expression in fusion constructs with the chloramphenicol acetyltransferase gene (Callis et al., Genes Develop. 1: 1183-1200 (1987)). In the same experimental system, the intron from the maize bronzel gene had a similar effect in enhancing expression. Intron sequences have been routinely incorporated into plant transformation vectors, typically within the non-translated leader.
A number of non-translated leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells. Specifically, leader sequences from Tobacco Mosaic Virus (TMV, the xe2x80x9cW-sequencexe2x80x9d), Maize Chlorotic Mottle Virus (MCMV), and Alfalfa Mosaic Virus (AMV) have been shown to be effective in enhancing expression (e.g. Gallie et al. Nucl. Acids Res. 15: 8693-8711 (1987); Skuzeski et al. Plant Molec. Biol. 15: 65-79 (1990)).
4. Targeting of the Gene Product Within the Cell
Various mechanisms for targeting gene products are known to exist in plants and the sequences controlling the functioning of these mechanisms have been characterized in some detail. For example, the targeting of gene products to the chloroplast is controlled by a signal sequence found at the amino terminal end of various proteins which is cleaved during chloroplast import to yield the mature protein (e.g. Comai et al. J. Biol. Chem. 263: 15104-15109 (1988)). These signal sequences can be fused to heterologous gene products to effect the import of heterologous products into the chloroplast (van den Broeck, et al. Nature 313: 358-363 (1985)). DNA encoding for appropriate signal sequences can be isolated from the 5xe2x80x2 end of the cDNAs encoding the RUBISCO protein, the CAB protein, the EPSP synthase enzyme, the GS2 protein and many other proteins which are known to be chloroplast localized. See also, the section entitled xe2x80x9cExpression With Chloroplast Targetingxe2x80x9d in Example 37 of U.S. Pat. No. 5,639,949.
Other gene products are localized to other organelles such as the mitochondrion and the peroxisome (e.g. Unger et al. Plant Molec. Biol. 13: 411-418 (1989)). The cDNAs encoding these products can also be manipulated to effect the targeting of heterologous gene products to these organelles. Examples of such sequences are the nuclear-encoded ATPases and specific aspartate amino transferase isoforms for mitochondria. Targeting cellular protein bodies has been described by Rogers et al. (Proc. NatI. Acad. Sci. USA 82: 6512-6516 (1985)).
In addition, sequences have been characterized which cause the targeting of gene products to other cell compartments. Amino terminal sequences are responsible for targeting to the ER, the apoplast, and extracellular secretion from aleurone cells (Koehler and Ho, Plant Cell 2: 769-783 (1990)). Additionally, amino terminal sequences in conjunction with carboxy terminal sequences are responsible for vacuolar targeting of gene products (Shinshi et al. Plant Molec. Biol. 14: 357-368 (1990)).
By the fusion of the appropriate targeting sequences described above to transgene sequences of interest it is possible to direct the transgene product to any organelle or cell compartment. For chloroplast targeting, for example, the chloroplast signal sequence from the RUBISCO gene, the CAB gene, the EPSP synthase gene, or the GS2 gene is fused in frame to the amino terminal ATG of the transgene. The signal sequence selected should include the known cleavage site, and the fusion constructed should take into account any amino acids after the cleavage site which are required for cleavage. In some cases this requirement may be fulfilled by the addition of a small number of amino acids between the cleavage site and the transgene ATG or, alternatively, replacement of some amino acids within the transgene sequence. Fusions constructed for chloroplast import can be tested for efficacy of chloroplast uptake by in vitro translation of in vitro transcribed constructions followed by in vitro chloroplast uptake using techniques described by Bartlett et al. In: Edelmann et al. (Eds.). Methods in Chloroplast Molecular Biology, Elsevier pp 1081-1091 (1982) and Wasmann et al. Mol. Gen. Genet. 205: 446-453 (1986). These construction techniques are well known in the art and are equally applicable to mitochondria and peroxisomes.
The above-described mechanisms for cellular targeting can be utilized not only in conjunction with their cognate promoters, but also in conjunction with heterologous promoters so as to effect a specific cell-targeting goal under the transcriptional regulation of a promoter that has an expression pattern different to that of the promoter from which the targeting signal derives.
Numerous transformation vectors available for plant transformation are known to those of ordinary skill in the plant transformation arts, and the genes pertinent to this invention can be used in conjunction with any such vectors. The selection of vector will depend upon the preferred transformation technique and the target species for transformation. For certain target species, different antibiotic or herbicide selection markers may be preferred. Selection markers used routinely in transformnation include the nptII gene, which confers resistance to kanamycin and related antibiotics (Messing and Vierra. Gene 19: 259-268 (1982); Bevan et al., Nature 304:184-187 (1983)), the bar gene, which confers resistance to the herbicide phosphinothricin (White et al., Nucl. Acids Res 18: 1062 (1990), Spencer et al. Theor. Appl. Genet 79: 625-631 (1990)), the hph gene, which confers resistance to the antibiotic hygromycin (Blochinger and Diggelmann, Mol Cell Biol 4: 2929-2931), and the dhfr gene, which confers resistance to methatrexate (Bourouis et al., EMBO J. 2(7): 1099-1104 (1983)), the EPSPS gene, which confers resistance to glyphosate (U.S. Pat. Nos. 4,940,935 and 5,188,642), and the mannose-6-phosphate isomerase gene, which provides the ability to metabolize mannose (U.S. Pat. Nos. 5,767,378 and 5,994,629).
1. Vectors Suitable for Agrobacterium Transformation
Many vectors are available for transformation using Agrobacterium tumefaciens. These typically carry at least one T-DNA border sequence and include vectors such as pBIN19 (Bevan, Nucl. Acids Res. (1984)) and pXYZ. Below, the construction of two typical vectors suitable for Agrobacterium transformation is described.
a. pCIB200 and pCIB2001:
The binary vectors pcIB200 and pCIB2001 are used for the construction of recombinant vectors for use with Agrobacterium and are constructed in the following manner. pTJS75kan is created by NarI digestion of pTJS75 (Schmidhauser and Helinski, J. Bacteriol. 164: 446-455 (1985)) allowing excision of the tetracycline-resistance gene, followed by insertion of an AccI fragment from pUC4K carrying an NPTII (Messing and Vierra, Gene 19: 259-268 (1982): Bevan et al., Nature 304: 184-187 (1983): McBride et al., Plant Molecular Biology 14: 266-276 (1990)). XhoI linkers are ligated to the EcoRV fragment of PCIB7 which contains the left and right T-DNA borders, a plant selectable nos/nptll chimeric gene and the pUC polylinker (Rothstein et al., Gene 53: 153-161 (1987)), and the Xhol-digested fragment are cloned into SalI-digested pTJS75kan to create pCIB200 (see also EP 0 332 104, example 19). pCIB200 contains the following unique polylinker restriction sites: EcoRI, SstI, KpnI, BglII, XbaI, and SalI. pCIB2001 is a derivative of pCIB200 created by the insertion into the polylinker of additional restriction sites. Unique restriction sites in the polylinker of pCIB2001 are EcoRI, SstI, KpnI, BglII, XbaI, SalI, MluI, BclI, AvrII, ApaI, HpaI, and StuI. pCIB2001, in addition to containing these unique restriction sites also has plant and bacterial kanamycin selection, left and right T-DNA borders for Agrobacterium-mediated transformation, the RK2-derived trfA function for mobilization between E. coli and other hosts, and the OriT and OriV functions also from RK2. The pCIB2001 polylinker is suitable for the cloning of plant expression cassettes containing their own regulatory signals.
b. pCIB10and Hygromycin Selection Derivatives thereof:
The binary vector pCIB 10 contains a gene encoding kanamycin resistance for selection in plants and T-DNA right and left border sequences and incorporates sequences from the wide host-range plasmid pRK252 allowing it to replicate in both E. coli and Agrobacterium. Its construction is described by Rothstein et al. (Gene 53: 153-161 (1987)). Various derivatives of pCIB 10 are constructed which incorporate the gene for hygromycin B phosphotransferase described by Gritz et al. (Gene 25: 179-188 (1983)). These derivatives enable selection of transgenic plant cells on hygromycin only (pCIB743), or hygromycin and kanamycin (pCIB715, pCIB717).
2. Vectors Suitable for non-Agrobacterium Transformation
Transformation without the use of Agrobacterium tumefaciens circumvents the requirement for T-DNA sequences in the chosen transformation vector and consequently vectors lacking these sequences can be utilized in addition to vectors such as the ones described above which contain T-DNA sequences. Transformation techniques that do not rely on Agrobacterium include transformation via particle bombardment, protoplast uptake (e.g. PEG and electroporation) and microinjection. The choice of vector depends largely on the preferred selection for the species being transformed. Below, the construction of typical vectors suitable for non-Agrobacterium transformation is described.
a. pCIB3064:
pCIB3064 is a pUC-derived vector suitable for direct gene transfer techniques in combination with selection by the herbicide basta (or phosphinothricin). The plasmid pCIB246 comprises the CaMV 35S promoter in operational fusion to the E. coli GUS gene and the CaMV 35S transcriptional terminator and is described in the PCT published application WO 93/07278. The 35S promoter of this vector contains two ATG sequences 5xe2x80x2 of the start site. These sites are mutated using standard PCR techniques in such a way as to remove the ATGs and generate the restriction sites SspI and PvuII. The new restriction sites are 96 and 37 bp away from the unique SalI site and 101 and 42 bp away from the actual start site. The resultant derivative of pCIB246 is designated pCIB3025. The GUS gene is then excised from pCIB3025 by digestion with SalI and SacI, the termini rendered blunt and religated to generate plasmid pCIB3060. The plasmid pJIT82 is obtained from the John Innes Centre, Norwich and the a 400 bp SmaI fragment containing the bar gene from Streptomyces viridochromogenes is excised and inserted into the HpaI site of pCIB3060 (Thompson et al. EMBO J 6: 2519-2523 (1987)). This generated pCIB3064, which comprises the bar gene under the control of the CaMV 35S promoter and terminator for herbicide selection, a gene for ampicillin resistance (for selection in E. coli) and a polylinker with the unique sites :SphI, PstI, HindIII, and BanHI. This vector is suitable for the cloning of plant expression cassettes containing their own regulatory signals.
b. pSOG19 and pSOG35:
pSOG35 is a transformation vector that utilizes the E. coli gene dihydrofolate reductase (DFR) as a selectable marker conferring resistance to methotrexate. PCR is used to amplify the 35S promoter (xe2x88x92800 bp), intron 6 from the maize Adhl gene (xe2x88x92550 bp) and 18 bp of the GUS untranslated leader sequence from pSOG10. A 250-bp fragment encoding the E. coli dihydrofolate reductase type II gene is also amplified by PCR and these two PCR fragments are assembled with a SacI-PstI fragment from pB1221 (Clontech) which comprises the pUC19 vector backbone and the nopaline synthase terminator. Assembly of these fragments generates pSOG19 which contains the 35S promoter in fusion with the intron 6 sequence, the GUS leader, the DHFR gene and the nopaline synthase terminator. Replacement of the GUS leader in pSOG19 with the leader sequence from Maize Chlorotic Mottle Virus (MCMV) generates the vector pSOG35. pSOG19 and pSOG35 carry the pUC gene for ampicillin resistance and have HindIII, SphI, PstI and EcoRI sites available for the cloning of foreign substances.
3. Vector Suitable for Chloroplast Transformation
For expression of a nucleotide sequence of the present invention in plant plastids, plastid transformation vector pPH143 (WO 97/32011, example 36) is used. The nucleotide sequence is inserted into pPH143 thereby replacing the PROTOX coding sequence. This vector is then used for plastid transformation and selection of transformants for spectinomycin resistance. Alternatively, the nucleotide sequence is inserted in pPH143 so that it replaces the aadH gene. In this case, transformants are selected for resistance to PROTOX inhibitors.
Once a nucleic acid sequence of the invention has been cloned into an expression system, it is transformed into a plant cell. Methods for transformation and regeneration of plants are well known in the art. For example, Ti plasmid vectors have been utilized for the delivery of foreign DNA, as well as direct DNA uptake, liposomes, electroporation, micro-injection, and microprojectiles. In addition, bacteria from the genus Agrobacterium can be utilized to transform plant cells. Below are descriptions of representative techniques for transforming both dicotyledonous and monocotyledonous plants, as well as a representative plastid transformation technique.
1. Transformation of Dicotyledons
Transformation techniques for dicotyledons are well known in the art and include Agrobacterium-based techniques and techniques that do not require Agrobacterium. Non-Agrobacterium techniques involve the uptake of exogenous genetic material directly by protoplasts or cells. This can be accomplished by PEG or electroporation mediated uptake, particle bombardment-mediated delivery, or microinjection. Examples of these techniques are described by Paszkowski et al., EMBO J 3: 2717-2722 (1984), Potrykus et al., Mol. Gen. Genet. 199: 169-177 (1985), Reich et al., Biotechnology 4: 1001-1004 (1986), and Klein et al., Nature 327: 70-73 (1987). In each case the transformed cells are regenerated to whole plants using standard techniques known in the art.
Agrobacterium-mediated transformation is a preferred technique for transformation of dicotyledons because of its high efficiency of transformation and its broad utility with many different species. Agrobacterium transformation typically involves the transfer of the binary vector carrying the foreign DNA of interest (e.g. pCIB200 or pCIB2001) to an appropriate Agrobacterium strain which may depend of the complement of vir genes carried by the host Agrobacterium strain either on a co-resident Ti plasmid or chromosomally (e.g. strain CIB542 for pCIB200 and pCIB2001 (Uknes et al. Plant Cell 5: 159-169 (1993)). The transfer of the recombinant binary vector to Agrobacterium is accomplished by a triparental mating procedure using E. coli carrying the recombinant binary vector, a helper E. coli strain which carries a plasmid such as pRK2013 and which is able to mobilize the recombinant binary vector to the target Agrobacterium strain. Alternatively, the recombinant binary vector can be transferred to Agrobacterium by DNA transformation (Hxc3x6fgen and Willmitzer, Nucl. Acids Res. 16: 9877 (1988)).
Transformation of the target plant species by recombinant Agrobacterium usually involves co-cultivation of the Agrobacterium with explants from the plant and follows protocols well known in the art. Transformed tissue is regenerated on selectable medium carrying the antibiotic or herbicide resistance marker present between the binary plasmid T-DNA borders.
Another approach to transforming plant cells with a gene involves propelling inert or biologically active particles at plant tissues and cells. This technique is disclosed in U.S. Pat. Nos. 4,945,050, 5,036,006, and 5,100,792 all to Sanford et al. Generally, this procedure involves propelling inert or biologically active particles at the cells under conditions effective to penetrate the outer surface of the cell and afford incorporation within the interior thereof. When inert particles are utilized, the vector can be introduced into the cell by coating the particles with the vector containing the desired gene. Alternatively, the target cell can be surrounded by the vector so that the vector is carried into the cell by the wake of the particle. Biologically active particles (e.g., dried yeast cells, dried bacterium or a bacteriophage, each containing DNA sought to be introduced) can also be propelled into plant cell tissue.
2. Transformation of Monocotyledons
Transformation of most monocotyledon species has now also become routine. Preferred techniques include direct gene transfer into protoplasts using PEG or electroporation techniques, and particle bombardment into callus tissue. Transformations can be undertaken with a single DNA species or multiple DNA species (i.e. co-transformation) and both these techniques are suitable for use with this invention. Co-transformation may have the advantage of avoiding complete vector construction and of generating transgenic plants with unlinked loci for the gene of interest and the selectable marker, enabling the removal of the selectable marker in subsequent generations, should this be regarded desirable. However, a disadvantage of the use of co-transformation is the less than 100% frequency with which separate DNA species are integrated into the genome (Schocher et al. Biotechnology 4: 1093-1096 (1986)).
Patent Applications EP 0 292 435, EP 0 392 225, and WO 93/07278 describe techniques for the preparation of callus and protoplasts from an elite inbred line of maize, transformation of protoplasts using PEG or electroporation, and the regeneration of maize plants from transformed protoplasts. Gordon-Kamm et al. (Plant Cell 2: 603-618 (1990)) and Fromm et al. (Biotechnology 8: 833-839 (1990)) have published techniques for transformation of A188-derived maize line using particle bombardment. Furthermore, WO 93/07278 and Koziel et al. (Biotechnology 11: 194-200 (1993)) describe techniques for the transformation of elite inbred lines of maize by particle bombardment. This technique utilizes immature maize embryos of 1.5-2.5 mm length excised from a maize ear 14-15 days after pollination and a PDS-1000He Biolistics device for bombardment.
Transformation of rice can also be undertaken by direct gene transfer techniques utilizing protoplasts or particle bombardment. Protoplast-mediated transformation has been described for Japonica-types and Indica-types (Zhang et al. Plant Cell Rep 7: 379-384 (1988); Shimamoto et al. Nature 338: 274-277 (1989); Datta et al. Biotechnology 8: 736-740 (1990)). Both types are also routinely transformable using particle bombardment (Christou et al. Biotechnology 2: 957-962 (1991)). Furthermore, WO 93/21335 describes techniques for the transformation of rice via electroporation.
Patent Application EP 0 332 581 describes techniques for the generation, transformation and regeneration of Pooideae protoplasts. These techniques allow the transformation of Dactylis and wheat. Furthermore, wheat transformation has been described by Vasil et al. (Biotechnology 10: 667-674 (1992)) using particle bombardment into cells of type C long-term regenerable callus, and also by Vasil et al. (Biotechnology 11: 1553-1558 (1993)) and Weeks et al. (Plant Physiol. 102: 1077-1084 (1993)) using particle bombardment of immature embryos and immature embryo-derived callus. A preferred technique for wheat transformation, however, involves the transformation of wheat by particle bombardment of immature embryos and includes either a high sucrose or a high maltose step prior to gene delivery. Prior to bombardment, any number of embryos (0.75-1 mm in length) are plated onto MS medium with 3% sucrose (Murashiga and Skoog, Physiologia Plantarum 15: 473-497 (1962)) and 3 mg/l 2,4-D for induction of somatic embryos, which is allowed to proceed in the dark. On the chosen day of bombardment, embryos are removed from the induction medium and placed onto the osmoticum (i.e. induction medium with sucrose or maltose added at the desired concentration, typically 15%). The embryos are allowed to plasmolyze for 2-3 h and are then bombarded. Twenty embryos per target plate is typical, although not critical. An appropriate gene-carrying plasmid (such as pCIB3064 or pSG35) is precipitated onto micrometer size gold particles using standard procedures. Each plate of embryos is shot with the DuPont Biolistics(copyright) helium device using a burst pressure of xcx9c1000 psi using a standard 80 mesh screen. After bombardment, the embryos are placed back into the dark to recover for about 24 h (still on osmoticum). After 24 hrs, the embryos are removed from the osmoticum and placed back onto induction medium where they stay for about a month before regeneration. Approximately one month later the embryo explants with developing embryogenic callus are transferred to regeneration medium (MS+1 mg/liter NAA, 5 mg/liter GA), further containing the appropriate selection agent (10 mg/l basta in the case of pCIB3064 and 2 mg/l methotrexate in the case of pSOG35). After approximately one month, developed shoots are transferred to larger sterile containers known as xe2x80x9cGA7sxe2x80x9d which contain half-strength MS, 2% sucrose, and the same concentration of selection agent.
Tranformation of monocotyledons using Agrobacterium has also been described. See, WO 94/00977 and U.S. Pat. No. 5,591,616, both of which are incorporated herein by reference.
3. Transformation of Plastids
In another preferred embodiment, a nucleotide sequence of the present invention is directly transformed into the plastid genome. A major advantage of plastid transformation is that plastids are generally capable of expressing bacterial genes without substantial modification, and plastids are capable of expressing multiple open reading frames under control of a single promoter. Plastid transformation technology is extensively described in U.S. Pat. Nos. 5,451,513, 5,545,817, and 5,545,818, in PCT application no. WO 95/16783, and in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91, 7301-7305. The basic technique for chloroplast transformation involves introducing regions of cloned plastid DNA flanking a selectable marker together with the gene of interest into a suitable target tissue, e.g., using biolistics or protoplast transformation (e.g., calcium chloride or PEG mediated transformation). The 1 to 1.5 kb flanking regions, termed targeting sequences, facilitate homologous recombination with the plastid genome and thus allow the replacement or modification of specific regions of the plastome. Initially, point mutations in the chloroplast 16S rRNA and rps12 genes conferring resistance to spectinomycin and/or streptomycin are utilized as selectable markers for transformation (Svab, Z., Hajdukiewicz, P., and Maliga, P. (1990) Proc. Natl. Acad. Sci. USA 87, 8526-8530; Staub, J. M., and Maliga, P. (1992) Plant Cell 4, 39-45). This resulted in stable homoplasmic transformants at a frequency of approximately one per 100 bombardments of target leaves. The presence of cloning sites between these markers allowed creation of a plastid targeting vector for introduction of foreign genes (Staub, J. M., and Maliga, P. (1993) EMBO J. 12, 601-606). Substantial increases in transformation frequency are obtained by replacement of the recessive rRNA or r-protein antibiotic resistance genes with a dominant selectable marker, the bacterial aada gene encoding the spectinomycin-detoxifying enzyme aminoglycoside-3xe2x80x2-adenyltransferase (Svab, Z., and Maliga, P. (1993) Proc. Natl. Acad. Sci. USA 90, 913-917). Previously, this marker had been used successfully for high-frequency transformation of the plastid genome of the green alga Chliamydomonas reinhardtii (Goldschmidt-Clermont, M. (1991) Nucl. Acids Res. 19: 4083-4089). Other selectable markers useful for plastid transformation are known in the art and encompassed within the scope of the invention. Typically, approximately 15-20 cell division cycles following transformation are required to reach a homoplastidic state. Plastid expression, in which genes are inserted by homologous recombination into all of the several thousand copies of the circular plastid genome present in each plant cell, takes advantage of the enormous copy number advantage over nuclear-expressed genes to permit expression levels that can readily exceed 10% of the total soluble plant protein. In a preferred embodiment, a nucleotide sequence of the present invention is inserted into a plastid targeting vector and transformed into the plastid genome of a desired plant host. Plants homoplastic for plastid genomes containing a nucleotide sequence of the present invention are obtained, and are preferentially capable of high expression of the nucleotide sequence.