The present invention relates generally to plant molecular biology. More specifically, it relates to nucleic acids and methods for modulating their expression in plants.
Polysaccharides constitute the bulk of the plant cell walls and have been traditionally classified into three categories: cellulose, hemicellulose, and pectin. Fry, S. C. (1988), The growing plant cell wall: Chemical and metabolic analysis. New York: Longman Scientific and Technical. Whereas cellulose is made at the plasma membrane and directly laid down into the cell wall, hemicellulosic and pectic polymers are first made in the Golgi apparatus and then exported to the cell wall by exocytosis. Ray, P. M., et al., (1976), Ber. Deutsch. Bot. Ges. Bd. 89, 121-146. The variety of chemical linkages in the pectic and hemicellulosic polysaccharides indicates that there must be tens of polysaccharide synthases in the Golgi apparatus. Darvill et al., (1980). The primary cell walls of flowering plants. In The Plant Cell (N. E. Tolbert, ed.), Vol. 1 in Series: The biochemistry of plants: A comprehensive treatise, eds. P. K. Stumpf and E. E. Conn (New York: Academic Press), pp. 91-162.
Cellulose, by virtue of its ability to form semicrystalline microfibrils, has a very high tensile strength which approaches that of some metals. Niklas, K. J. (1992). Plant Biomechanics: An engineering approach to plant form and function, The University of Chicago Press, pp. 607. Bending strength of the culm of normal and brittle-culm mutants of barley has been found to be directly correlated with the concentration of cellulose in the cell wall. Kokubo, et al., (1989), Plant Physiology 91, 876-882; Kokubo, et. al., (1991) Plant Physiology 97, 509-514.
Even though sugar and polysaccharide compositions of the plant cell walls have been well characterized, very limited progress has been made toward identification of the enzymes involved in polysaccharides formation, the reason being their labile nature and recalcitrance to solubilization by available detergents. Sporadic claims for the identification of cellulose synthase from plant sources have been made over the years. Callaghan, T., and Benziman, M. (1984), Nature 311, 165-167; Okuda, et al., (1993), Plant Physiol. 101, 1131-1142. However, these claims have been met with skepticism. Callaghan, T., and Benziman, M. (1985), Nature 314, 383-384; Delmer, et al., (1993), Plant Physiol. 103, 307-308. It was only recently that a putative gene for plant cellulose synthase (CeIA) was cloned from the developing cotton fibers based on homology to the bacterial gene. Pear, et al., Proc. Natl. Acad. Sci. (USA) 93, 12637-12642; Saxena, et al., (1990), Plant Molecular Biology 15, 673-684; see also, WO 9818949.
As brittle snap is a major problem in corn breeding, what is needed in the art are compositions and methods for manipulating cellulose concentration in the cell wall and thereby altering plant stalk quality for improved standability or silage. The present invention provides these and other advantages.
Generally, it is the object of the present invention to provide nucleic acids and proteins relating to cellulose synthases. It is an object of the present invention to provide: 1) nucleic acids and proteins relating to maize cellulose synthases; 2) transgenic plants comprising the nucleic acids of the present invention; 3) methods for modulating, in a transgenic plant, the expression of the nucleic acids of the present invention.
Therefore, in one aspect, the present invention relates to an isolated nucleic acid comprising a member selected from the group consisting of (a) a polynucleotide having a specified sequence identity to a polynucleotide encoding a polypeptide of the present invention;; (b) a polynucleotide which is complementary to the polynucleotide of (a); and (c) a polynucleotide comprising a specified number of contiguous nucleotides from a polynucleotide of (a) or (b). The isolated nucleic acid can be DNA or RNA.
In another aspect, the present invention relates to recombinant expression cassettes, comprising a nucleic acid of the present invention operably linked to a promoter.
In some embodiments, the nucleic acid is operably linked in antisense orientation to the promoter.
In another aspect, the present invention is directed to a host cell transfected with the recombinant expression cassette.
In a further aspect, the present invention relates to an isolated protein comprising a polypeptide having a specified number of contiguous amino acids encoded by an isolated nucleic acid of the present invention.
In another aspect, the present invention relates to an isolated nucleic acid comprising a polynucleotide of specified length which selectively hybridizes under stringent conditions to a polynucleotide of the present invention, or a complement thereof. In some embodiments, the isolated nucleic acid is operably linked to a promoter.
In yet another aspect, the present invention relates to an isolated nucleic acid comprising a polynucleotide, the polynucleotide having a specified sequence identity to an identical length of a nucleic acid of the present invention or a complement thereof.
In another aspect, the present invention relates to an isolated nucleic acid comprising a polynucleotide having a sequence of a nucleic acid amplified from a Zea mays nucleic acid library using at least two primers or their complements, one of which selectively hyridizes under stringent conditions to a locus of the nucleic acid comprising the 5xe2x80x2 terminal coding region and the other primer selectively hybridizing, under stringent conditions, to a locus of the nucleic acid comprising the 3xe2x80x2 terminal coding region, and wherein both primers selectively hybridize within the coding region. In some embodiments, the nucleic acid library is a cDNA library.
In another aspect, the present invention relates to a recombinant expression cassette comprising a nucleic acid, wherein the nucleic acid is operably linked to a promoter. In some embodiments, the present invention relates to a host cell transfected with this recombinant expression cassette. In some embodiments, the present invention relates to a protein of the present invention which is produced from this host cell.
In a further aspect, the present invention relates to a heterologous promoter operably linked to a non-isolated polynucleotide of the present invention, wherein the polypeptide is encoded by a nucleic acid amplified from a nucleic acid library.
In yet another aspect, the present invention relates to a transgenic plant comprising a recombinant expression cassette comprising a plant promoter operably linked to any of the isolated nucleic acids of the present invention. In some embodiments, the transgenic plant is Zea mays. The present invention also provides transgenic seed from the transgenic plant.
In a further aspect, the present invention relates to a method of modulating expression of the genes encoding the proteins of the present invention in a plant cell capable of plant regeneration, comprising the steps of (a) transforming a plant cell with a recombinant expression cassette comprising a polynucleotide of the present invention operably linked to a promoter; (b) growing the plant cell under plant growing conditions; and (c) inducing expression of the polynucleotide for a time sufficient to modulate expression of the genes in the plant. In some embodiments, the plant is maize. Expression of the genes encoding the proteins of the present invention can be increased or decreased relative to a non-transformed control plant.
Definitions
Units, prefixes, and symbols may be denoted in their SI accepted form. Unless otherwise indicated, nucleic acids are written left to right in 5xe2x80x2 to 3xe2x80x2 orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. Numeric ranges are inclusive of the numbers defining the range and include each integer within the defined range. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes. Unless otherwise provided for, software, electrical, and electronics terms as used herein are as defined in The New IEEE Standard Dictionary of Electrical and electronics Terms (5th edition, 1993). The terms defined below are more fully defined by reference to the specification as a whole.
By xe2x80x9camplifiedxe2x80x9d is meant the construction of multiple copies of a nucleic acid sequence or multiple copies complementary to the nucleic acid sequence using at least one of the nucleic acid sequences as a template. Amplification systems include the polymerase chain reaction (PCR) system, ligase chain reaction (LCR) system, nucleic acid sequence based amplification (NASBA, Cangene, Mississauga, Ontario), Q-Beta Replicase systems, transcription-based amplification system (TAS), and strand displacement amplification (SDA). See, e.g., Diagnostic Molecular Microbiology: Principles and Applications, D. H. Persing et al., Ed., American Society for Microbiology, Washington, D.C. (1993). The product of amplification is termed an amplicon.
The term xe2x80x9cantibodyxe2x80x9d includes reference to antigen binding forms of antibodies (e.g., Fab, F(ab)2). The term xe2x80x9cantibodyxe2x80x9d frequently refers to a polypeptide substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof which specifically bind and recognize an analyte (antigen). However, while various antibody fragments can be defined in terms of the digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments such as single chain Fv, chimeric antibodies (i.e., comprising constant and variable regions from different species), humanized antibodies (i.e., comprising a complementarity determining region (CDR) from a non-human source) and heteroconjugate antibodies (e.g., bispecific antibodies).
The term xe2x80x9cantigenxe2x80x9d includes reference to a substance to which an antibody can be generated and/or to which the antibody is specifically immunoreactive. The specific immunoreactive sites within the antigen are known as epitopes or antigenic determinants. These epitopes can be a linear array of monomers in a polymeric compositionxe2x80x94such as amino acids in a proteinxe2x80x94or consist of or comprise a more complex secondary or tertiary structure. Those of skill will recognize that all immunogens (i.e., substances capable of eliciting an immune response) are antigens; however some antigens, such as haptens, are not immunogens but may be made immunogenic by coupling to a carrier molecule. An antibody immunologically reactive with a particular antigen can be generated in vivo or by recombinant methods such as selection of libraries of recombinant antibodies in phage or similar vectors. See, e.g., Huse et al., Science 246: 1275-1281 (1989); and Ward, et al., Nature 341: 544-546 (1989); and Vaughan et. al., Nature Biotech. 14: 309-314 (1996).
As used herein, xe2x80x9cantisense orientationxe2x80x9d includes reference to a duplex polynucleotide sequence which is operably linked to a promoter in an orientation where the antisense strand is transcribed. The antisense strand is sufficiently complementary to an endogenous transcription product such that translation of the endogenous transcription product is often inhibited.
As used herein, xe2x80x9cchromosomal regionxe2x80x9d includes reference to a length of a chromosome which may be measured by reference to the linear segment of DNA which it comprises. The chromosomal region can be defined by reference to two unique DNA sequences, i.e., markers.
The term xe2x80x9cconservatively modified variantsxe2x80x9d applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or conservatively modified variants of the amino acid sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are xe2x80x9csilent variationsxe2x80x9d and represent one species of conservatively modified variation. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of ordinary skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine; and UGG , which is ordinarily the only codon for for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide of the present invention is implicit in each described polypeptide sequence and incorporated herein by reference.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a xe2x80x9cconservatively modified variantxe2x80x9d where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Thus, any number of amino acid residues selected from the group of integers consisting of from 1 to 15 can be so altered. Thus, for example, 1, 2, 3, 4, 5, 7, or 10 alterations can be made. Conservatively modified variants typically provide similar biological activity as the unmodified polypeptide sequence from which they are derived. For example, substrate specificity, enzyme activity, or ligand/receptor binding is generally at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the native protein for it""s native substrate. Conservative substitution tables providing functionally similar amino acids are well known in the art.
The following six groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Serine (S), Threonine (T);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
See also, Creighton (1984) Proteins W. H. Freeman and Company.
By xe2x80x9cencodingxe2x80x9d or xe2x80x9cencodedxe2x80x9d, with respect to a specified nucleic acid, is meant comprising the information for translation into the specified protein. A nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid, or may lack such intervening non-translated sequences (e.g., as in cDNA). The information by which a protein is encoded is specified by the use of codons. Typically, the amino acid sequence is encoded by the nucleic acid using the xe2x80x9cuniversalxe2x80x9d genetic code. However, variants of the universal code, such as are present in some plant, animal, and fungal mitochondria, the bacterium Mycoplasma capricolum (Proc. Natl. Acad. Sci. (USA), 82: 2306-2309 (1985)), or the ciliate Macronucleus, may be used when the nucleic acid is expressed using these organisms.
When the nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended host where the nucleic acid is to be expressed. For example, although nucleic acid sequences of the present invention may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. Nucl. Acids Res. 17: 477-498 (1989)). Thus, the maize preferred codon for a particular amino acid may be derived from known gene sequences from maize. Maize codon usage for 28 genes from maize plants are listed in Table 4 of Murray et al., above.
As used herein xe2x80x9cfull-length sequencexe2x80x9d in reference to a specified polynucleotide or its encoded protein means having the entire amino acid sequence of, a native (non-synthetic), endogenous, catalytically active form of the specified protein. Methods to determine whether a sequence is full-length are well known in the art including such exemplary techniques as northern or western blots, primer extension, S1 protection, and ribonuclease protection. See, e.g., Plant Molecular Biology: A Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997). Comparison to known full-length homologous (orthologous and/or paralogous) sequences can also be used to identify full-length sequences of the present invention. Additionally, consensus sequences typically present at the 5xe2x80x2 and 3xe2x80x2 untranslated regions of mRNA aid in the identification of a polynucleotide as full-length. For example, the consensus sequence ANNNNAUGG, where the underlined codon represents the N-terminal methionine, aids in determining whether the polynucleotide has a complete 5xe2x80x2 end. Consensus sequences at the 3xe2x80x2 end, such as polyadenylation sequences, aid in determining whether the polynucleotide has a complete 3xe2x80x2 end.
The term xe2x80x9cgene activityxe2x80x9d refers to one or more steps involved in gene expression, including transcription, translation, and the functioning of the protein encoded by the gene.
As used herein, xe2x80x9cheterologousxe2x80x9d in reference to a nucleic acid is a nucleic acid that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterologous structural gene is from a species different from that from which the structural gene was derived, or, if from the same species, one or both are substantially modified from their original form. A heterologous protein may originate from a foreign species or, if from the same species, is substantially modified from its original form by deliberate human intervention.
By xe2x80x9chost cellxe2x80x9d is meant a cell which contains a vector and supports the replication and/or expression of the expression vector. Host cells may be prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or mammalian cells. Preferably, host cells are monocotyledonous or dicotyledonous plant cells. A particularly preferred monocotyledonous host cell is a maize host cell.
The term xe2x80x9chybridization complexxe2x80x9d includes reference to a duplex nucleic acid structure formed by two single-stranded nucleic acid sequences selectively hybridized with each other.
By xe2x80x9cimmunologically reactive conditionsxe2x80x9d or xe2x80x9cimmunoreactive conditionsxe2x80x9d is meant conditions which allow an antibody, generated to a particular epitope, to bind to that epitope to a detectably greater degree (e.g., at least 2-fold over background) than the antibody binds to substantially all other epitopes in a reaction mixture comprising the particular epitope. Immunologically reactive conditions are dependent upon the format of the antibody binding reaction and typically are those utilized in immunoassay protocols. See Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York (1988), for a description of immunoassay formats and conditions.
The term xe2x80x9cintroducedxe2x80x9d in the context of inserting a nucleic acid into a cell, means xe2x80x9ctransfectionxe2x80x9d or xe2x80x9ctransformationxe2x80x9d or xe2x80x9ctransductionxe2x80x9d and includes reference to the incorporation of a nucleic acid into a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated into the genome of the cell (e.g., chromosome, plasmid, plastid or mitochondrial DNA), converted into an autonomous replicon, or transiently expressed (e.g., transfected mRNA).
The terms xe2x80x9cisolatedxe2x80x9d refers to material, such as a nucleic acid or a protein, which is: (1) substantially or essentially free from components which normally accompany or interact with it as found in its naturally occurring environment. The isolated material optionally comprises material not found with the material in its natural environment; or (2) if the material is in its natural environment, the material has been synthetically (non-naturally) altered by deliberate human intervention to a composition and/or placed at a locus in the cell (e.g., genome or subcellular organelle) not native to a material found in that environment. The alteration to yield the synthetic material can be performed on the material within or removed from its natural state. For example, a naturally occurring nucleic acid becomes an isolated nucleic acid if it is altered, or if it is transcribed from DNA which has been altered, by non-natural, synthetic (i.e., xe2x80x9cman-madexe2x80x9d) methods performed within the cell from which it originates. See, e.g., Compounds and Methods for Site Directed Mutagenesis in Eukaryotic Cells, Kmiec, U.S. Pat. No. 5,565,350; In Vivo Homologous Sequence Targeting in Eukaryotic Cells; Zarling et al., PCT/US93/03868. Likewise, a naturally occurring nucleic acid (e.g., a promoter) becomes isolated if it is introduced by non-naturally occurring means to a locus of the genome not native to that nucleic acid. Nucleic acids which are xe2x80x9cisolatedxe2x80x9d as defined herein, are also referred to as xe2x80x9cheterologousxe2x80x9d nucleic acids.
Unless otherwise stated, the term xe2x80x9ccellulose synthase nucleic acidxe2x80x9d is a nucleic acid of the present invention and means a nucleic acid comprising a polynucleotide of the present invention (a xe2x80x9ccellulose synthase polynucleotidexe2x80x9d) encoding a cellulose synthase polypeptide. A xe2x80x9ccellulose synthase genexe2x80x9d is a gene of the present invention and refers to a non-heterologous genomic form of a full-length cellulose synthase polynucleotide.
As used herein, xe2x80x9clocalized within the chromosomal region defined by and includingxe2x80x9d with respect to particular markers includes reference to a contiguous length of a chromosome delimited by and including the stated markers.
As used herein, xe2x80x9cmarkerxe2x80x9d includes reference to a locus on a chromosome that serves to identify a unique position on the chromosome. A xe2x80x9cpolymorphic markerxe2x80x9d includes reference to a marker which appears in multiple forms (alleles) such that different forms of the marker, when they are present in a homologous pair, allow transmission of each of the chromosomes in that pair to be followed. A genotype may be defined by use of one or a plurality of markers.
As used herein, xe2x80x9cnucleic acidxe2x80x9d includes reference to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues having the essential nature of natural nucleotides in that they hybridize to single-stranded nucleic acids in a manner similar to naturally occurring nucleotides (e.g., peptide nucleic acids).
By xe2x80x9cnucleic acid libraryxe2x80x9d is meant a collection of isolated DNA or RNA molecules which comprise and substantially represent the entire transcribed fraction of a genome of a specified organism. Construction of exemplary nucleic acid libraries, such as genomic and cDNA libraries, is taught in standard molecular biology references such as Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, Vol. 152, Academic Press, Inc., San Diego, Calif. (Berger); Sambrook et al. Molecular Cloningxe2x80x94A Laboratory Manual, 2nd ed., Vol. 1-3 (1989); and Current Protocols in Molecular Biology, F. M. Ausubel et al. Eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley and Sons, Inc. (1994 Supplement).
As used herein xe2x80x9coperably linkedxe2x80x9d includes reference to a functional linkage between a promoter and a second sequence, wherein the promoter sequence initiates and mediates transcription of the DNA sequence corresponding to the second sequence. Generally, operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary to join two protein coding regions, contiguous and in the same reading frame.
As used herein, the term xe2x80x9cplantxe2x80x9d includes reference to whole plants, plant parts or organs (e.g., leaves, stems, roots, etc.), plant cells, seeds and progeny of same. Plant cell, as used herein includes, without limitation, cells obtained from or found in: seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores. Plant cells can also be understood to include modified cells, such as protoplasts, obtained from the aforementioned tissues. The class of plants which can be used in the methods of the invention is generally as broad as the class of higher plants amenable to transformation techniques, including both monocotyledonous and dicotyledonous plants. Particularly preferred plants include maize, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley and millet.
As used herein, xe2x80x9cpolynucleotidexe2x80x9d includes reference to a deoxyribopolynucleotide, ribopolynucleotide, or analogs thereof that have the essential nature of a natural ribonucleotide in that they hybridize, under stringent hybridization conditions, to substantially the same nucleotide sequence as naturally occurring nucleotides and/or allow translation into the same amino acid(s) as the naturally occurring nucleotide(s). A polynucleotide can be full-length or a subsequence of a native or heterologous structural or regulatory gene. Unless otherwise indicated, the term includes reference to the specified sequence as well as the complementary sequence thereof. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are xe2x80x9cpolynucleotidesxe2x80x9d as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art. The term polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including among other things, simple and complex cells.
The terms xe2x80x9cpolypeptidexe2x80x9d, xe2x80x9cpeptidexe2x80x9d and xe2x80x9cproteinxe2x80x9d are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The essential nature of such analogues of naturally occurring amino acids is that, when incorporated into a protein, that protein is specifically reactive to antibodies elicited to the same protein but consisting entirely of naturally occurring amino acids. The terms xe2x80x9cpolypeptidexe2x80x9d, xe2x80x9cpeptidexe2x80x9d and xe2x80x9cproteinxe2x80x9d are also inclusive of modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation. Exemplary modifications are described in most basic texts, such as, Proteinsxe2x80x94Structure and Molecular Properties, 2nd ed., T. E. Creighton, W. H. Freeman and Company, New York (1993). Many detailed reviews are available on this subject, such as, for example, those provided by Wold, F., Post-translational Protein Modifications: Perspectives and Prospects, pp.1-12 in Posttranslational Covalent Modification of Proteins, B. C. Johnson, Ed., Academic Press, New York (1983); Seifter et al., Meth. Enzymol. 182: 626-646 (1990) and Rattan et al., Protein Synthesis: Posttranslational Modifications and Aging, Ann. N. Y. Acad. Sci. 663: 48-62 (1992). It will be appreciated, as is well known and as noted above, that polypeptides are not always entirely linear. For instance, polypeptides may be branched as a result of ubiquitination, and they may be circular, with or without branching, generally as a result of posttranslation events, including natural processing event and events brought about by human manipulation which do not occur naturally. Circular, branched and branched circular polypeptides may be synthesized by non-translation natural process and by entirely synthetic methods, as well. Modifications can occur anywhere in a polypeptide, including the peptide backbone, the amino acid side-chains and the amino or carboxyl termini. In fact, blockage of the amino or carboxyl group in a polypeptide, or both, by a covalent modification, is common in naturally occurring and synthetic polypeptides and such modifications may be present in polypeptides of the present invention, as well. For instance, the amino terminal residue of polypeptides made in E. coli or other cells, prior to proteolytic processing, almost invariably will be N-formylmethionine. During post-translational modification of the peptide, a methionine residue at the NH2-terminus may be deleted. Accordingly, this invention contemplates the use of both the methionine-containing and the methionine-less amino terminal variants of the protein of the invention. In general, as used herein, the term polypeptide encompasses all such modifications, particularly those that are present in polypeptides synthesized by expressing a polynucleotide in a host cell.
As used herein xe2x80x9cpromoterxe2x80x9d includes reference to a region of DNA upstream from the start of transcription and involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A xe2x80x9cplant promoterxe2x80x9d is a promoter capable of initiating transcription in plant cells. Exemplary plant promoters include, but are not limited to, those that are obtained from plants, plant viruses, and bacteria which comprise genes expressed in plant cells such Agrobacterium or Rhizobium. Examples of promoters under developmental control include promoters that preferentially initiate transcription in certain tissues, such as leaves, roots, or seeds. Such promoters are referred to as xe2x80x9ctissue preferredxe2x80x9d. Promoters which initiate transcription only in certain tissue are referred to as xe2x80x9ctissue specificxe2x80x9d. A xe2x80x9ccell typexe2x80x9d specific promoter primarily drives expression in certain cell types in one or more organs, for example, vascular cells in roots or leaves. An xe2x80x9cinduciblexe2x80x9d promoter is a promoter which is under environmental control. Examples of environmental conditions that may effect transcription by inducible promoters include anaerobic conditions or the presence of light. Tissue specific, tissue preferred, cell type specific, and inducible promoters constitute the class of xe2x80x9cnon-constitutivexe2x80x9d promoters. A xe2x80x9cconstitutivexe2x80x9d promoter is a promoter which is active under most environmental conditions.
The term xe2x80x9ccellulose synthase polypeptidexe2x80x9d is a polypeptide of the present invention and refers to one or more amino acid sequences, in glycosylated or non-glycosylated form. The term is also inclusive of fragments, variants, homologs, alleles or precursors (e.g., preproproteins or proproteins) thereof. A xe2x80x9ccellulose synthase proteinxe2x80x9d is a protein of the present invention and comprises a cellulose synthase polypeptide.
As used herein xe2x80x9crecombinantxe2x80x9d includes reference to a cell or vector, that has been modified by the introduction of a heterologous nucleic acid or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found in identical form within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all as a result of deliberate human intervention. The term xe2x80x9crecombinantxe2x80x9d as used herein does not encompass the alteration of the cell or vector by naturally occurring events (e.g., spontaneous mutation, natural transformation/transduction/transposition) such as those occurring without deliberate human intervention.
As used herein, a xe2x80x9crecombinant expression cassettexe2x80x9d is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements which permit transcription of a particular nucleic acid in a host cell. The recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid to be transcribed, and a promoter.
The term xe2x80x9cresiduexe2x80x9d or xe2x80x9camino acid residuexe2x80x9d or xe2x80x9camino acidxe2x80x9d are used interchangeably herein to refer to an amino acid that is incorporated into a protein, polypeptide, or peptide (collectively xe2x80x9cproteinxe2x80x9d). The amino acid may be a naturally occurring amino acid and, unless otherwise limited, may encompass known analogs of natural amino acids that can function in a similar manner as naturally occurring amino acids.
The term xe2x80x9cselectively hybridizesxe2x80x9d includes reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences and to the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80% sequence identity, preferably 90% sequence identity, and most preferably 100% sequence identity (i.e., complementary) with each other.
The term xe2x80x9cspecifically reactivexe2x80x9d, includes reference to a binding reaction between an antibody and a protein having an epitope recognized by the antigen binding site of the antibody. This binding reaction is determinative of the presence of a protein having the recognized epitope amongst the presence of a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to an analyte having the recognized epitope to a substantially greater degree (e.g., at least 2-fold over background) than to substantially all other analytes lacking the epitope which are present in the sample.
The terms xe2x80x9cstringent conditionsxe2x80x9d or xe2x80x9cstringent hybridization conditionsxe2x80x9d includes reference to conditions under which a probe will hybridize to its target sequence, to a detectably greater degree than other sequences (e.g., at least 2-fold over background). Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to the probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, preferably less than 500 nucleotides in length.
Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30xc2x0 C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60xc2x0 C. for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37xc2x0 C., and a wash in 1xc3x97 to 2xc3x97SSC (20xc3x97SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55xc2x0 C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37xc2x0 C., and a wash in 0.5xc3x97 to 1xc3x97SSC at 55 to 60xc2x0 C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37xc2x0 C., and a wash in 0.1xc3x97SSC at 60 to 65xc2x0 C.
Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, Anal. Biochem., 138:267-284 (1984): Tm=81.5xc2x0 C.+16.6 (log M)+0.41 (% GC)xe2x88x920.61 (% form)xe2x88x92500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1xc2x0 C. for each 1% of mismatching; thus, Tm, hybridization and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with xe2x89xa790% identity are sought, the Tm can be decreased 10xc2x0 C. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4xc2x0 C. lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10xc2x0 C. lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20xc2x0 C. lower than the thermal melting point (Tm). Using the equation, hybridization and wash compositions, and desired Tm, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a Tm of less than 45xc2x0 C. (aqueous solution) or 32xc2x0 C. (formamide solution) it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biologyxe2x80x94Hybridization with Nucleic Acid Probes, Part I, Chapter 2 xe2x80x9cOverview of principles of hybridization and the strategy of nucleic acid probe assaysxe2x80x9d, Elsevier, New York (1993); and Current Protocols in Molecular Biology, Chapter 2, Ausubel, et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).
As used herein, xe2x80x9ctransgenic plantxe2x80x9d includes reference to a plant which comprises within its genome a heterologous polynucleotide. Generally, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette. xe2x80x9cTransgenicxe2x80x9d is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic. The term xe2x80x9ctransgenicxe2x80x9d as used herein does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, non-recombinant viral infection, non-recombinant bacterial transformation, non-recombinant transposition, or spontaneous mutation.
As used herein, xe2x80x9cvectorxe2x80x9d includes reference to a nucleic acid used in transfection of a host cell and into which can be inserted a polynucleotide. Vectors are often replicons. Expression vectors permit transcription of a nucleic acid inserted therein.
The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) xe2x80x9creference sequencexe2x80x9d, (b) xe2x80x9ccomparison windowxe2x80x9d, (c) xe2x80x9csequence identityxe2x80x9d, (d) xe2x80x9cpercentage of sequence identityxe2x80x9d, and (e) xe2x80x9csubstantial identityxe2x80x9d.
(a) As used herein, xe2x80x9creference sequencexe2x80x9d is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
(b) As used herein, xe2x80x9ccomparison windowxe2x80x9d means includes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence may be compared to a reference sequence and wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2: 482 (1981); by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970); by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85: 2444 (1988); by computerized implementations of these algorithms, including, but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, Calif., GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis., USA; the CLUSTAL program is well described by Higgins and Sharp, Gene 73: 237-244 (1988); Higgins and Sharp, CABIOS 5: 151-153 (1989); Corpet, et. al., Nucleic Acids Research 16: 10881-90 (1988); Huang, et al., Computer Applications in the Biosciences 8: 155-65 (1992), and Pearson, et al., Methods in Molecular Biology 24: 307-331 (1994). The BLAST family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences. See, Current Protocols in Molecular Biology, Chapter 19, Ausubel, et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).
Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using the BLAST 2.0 suite of programs using default parameters. Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997). Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always  greater than 0) and N (penalty score for mismatching residues; always  less than 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=xe2x88x924, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915).
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat""l. Acad. Sci. USA 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
BLAST searches assume that proteins can be modeled as random sequences. However, many real proteins comprise regions of nonrandom sequences which may be homopolymeric tracts, short-period repeats, or regions enriched in one or more amino acids. Such low-complexity regions may be aligned between unrelated proteins even though other regions of the protein are entirely dissimilar. A number of low-complexity filter programs can be employed to reduce such low-complexity alignments. For example, the SEG (Wooten and Federhen, Comput. Chem., 17:149-163 (1993)) and XNU (Claverie and States, Comput. Chem., 17:191-201 (1993)) low-complexity filters can be employed alone or in combination.
(c) As used herein, xe2x80x9csequence identityxe2x80x9d or xe2x80x9cidentityxe2x80x9d in the context of two nucleic acid or polypeptide sequences includes reference to the residues in the two sequences which are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g. charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences which differ by such conservative substitutions are said to have xe2x80x9csequence similarityxe2x80x9d or xe2x80x9csimilarityxe2x80x9d. Means for making this adjustment are well-known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., according to the algorithm of Meyers and Miller, Computer Applic. Biol. Sci., 4: 11-17 (1988) e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif., USA).
(d) As used herein, xe2x80x9cpercentage of sequence identityxe2x80x9d means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
(e) (i) The term xe2x80x9csubstantial identityxe2x80x9d of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70% sequence identity, preferably at least 80%, more preferably at least 90% and most preferably at least 95%, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 60%, more preferably at least 70%, 80%, 90%, and most preferably at least 95%.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions. However, nucleic acids which do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is that the polypeptide which the first nucleic acid encodes is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.
(e) (ii) The terms xe2x80x9csubstantial identityxe2x80x9d in the context of a peptide indicates that a peptide comprises a sequence with at least 70% sequence identity to a reference sequence, preferably 80%, more preferably 85%, most preferably at least 90% or 95% sequence identity to the reference sequence over a specified comparison window. Preferably, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48: 443 (1970). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Peptides which are xe2x80x9csubstantially similarxe2x80x9d share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes.