This invention relates to compositions isolated from plants and their use in the modification of gene transcription and/or expression. More specifically, this invention relates to plant polynucleotide sequences encoding transcription factors that are components of the cellular transcription apparatus and the use of such polynucleotide sequences in the modification of gene expression.
Eucaryotic gene expression is regulated, in part, by the cellular processes involved in transcription. During transcription, a single-stranded RNA complementary to the DNA sequence to be transcribed is formed by the action of RNA polymerases. Initiation of transcription in eucaryotic cells is regulated by complex interactions between cis-acting DNA motifs, located upstream of the gene to be transcribed, and trans-acting protein factors. Among the cis-acting regulatory regions are sequences of DNA, termed promoters, which are located close to the transcription initiation site and to which RNA polymerase is first bound, either directly or indirectly. Promoters usually consist of proximal (e.g., TATA box) and more distant elements (e.g., CCAAT box). Enhancers are cis-acting DNA motifs which may be situated further up- and/or down-stream from the initiation site.
Both promoters and enhancers are generally composed of several discrete, often redundant, elements each of which may be recognized by one or more trans-acting regulatory proteins, known as transcription factors. Regulation of the complex patterns of gene expression observed both spatially and temporally, in all developing organisms, is thought to arise from the interaction of enhancer- and promoter-bound, general and tissue-specific transcription factors with DNA (Izawa T, Foster R and Chua N H, J. Mol. Biol. 230:1131-1144, 1993; Menkens A E, Schindler U and Cashmore A R, Trends in Biochem. Sci. 13:506-510, 1995). Developmental decisions in organisms as diverse as Drosophila melanogaster, Saccharomyces cerevisiae, Arabidopsis thaliana and Pinus radiata are regulated by transcription factors. These DNA-binding regulatory molecules have been shown to control the expression of genes responsible for the differentiation of different cell types, for example, the differentiation of leaf trichomes and xylem tissue in Arabidopsis thaliana, formation of endoderm from embryonic cells in Xenopus laevis and the initiation of gene expression in response to environmental and phytohormonal stress in plants (Yanagisawa S and Sheen J, The Plant Cell 10:75-89, 1998).
Transcription factors generally bind DNA in a sequence-specific manner and either activate or repress transcription initiation. The specific mechanisms of these interactions remain to be fully elucidated. At least three separate domains have been identified within transcription factors. One is essential for sequence-specific DNA recognition, one for the activation/repression of transcriptional initiation, and one for the formation of protein-protein interactions (such as dimerization). Four motifs, or domains, involved in DNA sequence recognition and/or transcription factor dimerization have been identified to date: zinc fingers; helix-turn-helix; leucine zipper; and helix-loop-helix. Both helix-loop-helix and leucine zipper protein motifs have been implicated in the binding of transcription factors to DNA via their ability to readily form homo- or hetero-dimers in vivo. xe2x80x9cActivatingxe2x80x9d domains are rich in either proline, glutamine or acidic amino acids. It has been proposed that this net negative region of the transcription factor interacts with the TATA box-binding transcription factor TFIID, RNA polymerase, and/or another protein associated with the transcription apparatus.
Studies indicate that many plant transcription factors can be grouped into distinct classes based on their conserved DNA binding domains (Katagiri F and Chua N H, Trends Genet. 8:22-27, 1992; Menkens A E, Schindler U and Cashmore A R, Trends in Biochem. Sci. 13:506-510, 1995; Martin C and Paz-Ares J, Trends Genet. 13:67-73, 1997). Each member of these families interacts and binds with distinct DNA sequence motifs that are often found in multiple gene promoters controlled by different regulatory signals. Several classes of transcription factors that have been identified to date are described below.
The basic/leucine zipper (bZIP) is a conserved family of transcription factors defined by a basic/leucine zipper (bZIP) motif (Landschultz et al., Science 240:1759-1764, 1988; McKnight, Sci Am. 264:54-64,1991; Foster et al., FASEB J. 8[2]:192-200, 1994). Transcriptional regulation of gene expression is mediated by both the bZIPs and other families of transcription factors, through the concerted action of sequence-specific transcription factors that interact with regulatory elements residing in the promoter regions of the corresponding gene. The bZIP bipartite DNA binding structure consists of a region enriched in basic amino acids (basic region) adjacent to a leucine zipper that is characterized by several leucine residues regularly spaced at seven amino acid intervals (Vinson et al., Science 246:911-916, 1989). Whereas the basic region directly contacts the DNA, the leucine zipper mediates homodimerisation and heterodimerisation of protein monomers through a parallel interaction of the hydrophobic dimerization interfaces of two xcex1-helices, resulting in a coiled-coil structure (O""Shea et al., Science 243:538-542, 1989; Science 254:539-544, 1991; Hu et al., Science 250:1400-1403,1990; Rasmussen et al., Proc. Natl. Acad. Sci. USA 88:561-564, 1991).
Dof proteins are a relatively new class of transcription factor and are thought to mediate the regulation of some patterns of plant gene expression in part by combinatorial interactions between bZIP proteins and other types of transcription factors binding to closely linked sites. Such an example of this combinatorial interaction has been observed between bZIP and Dof transcription factors (Singh, Plant Physiol. 118:1111-1120, 1998). These Dof proteins possess a single zinc-finger DNA binding domain that is highly conserved in plants (Yanagisawa, Trends Plant Sci. 1:213, 1996). Specific binding of the Dof protein to bZIP transcription factors has been demonstrated and it has been proposed that this specific interaction results in the stimulation of bZIP binding to DNA target sequences in plant promoters (Chen et al., Plant J. 10:955-966, 1996). Examples of such Dof/bZIP interactions have been reported in the literature, including for example, the Arabidopsis thaliana glutathionine S-transferase-6 gene (GST6) promoter which has been shown to contain several Dof-binding sites closely linked to the ocs element, a recognized bZIP binding site (Singh, Plant Physiol. 118:1111-1120, 1998).
The bZIP family f G-box binding factors from Arabidopsis (including GBF1, GBF2 and GBF3, for example) interact with the palindromic G-box motif (CCACGTGG). However, it has been demonstrated that the DNA binding specificity of such transcription factors, for example GBF1, may be influenced by the nature of the nucleotides flanking the ACGT core (Schindler et al., EMBO J. 11:1274-1289, 1992a). In vivo transient and transgenic plant expression studies have shown that these ACGT elements are necessary for maximal transcriptional activation and have been identified in a multitude of plant genes regulated by diverse environmental, physiological, and environmental cues. Classification of these transcription factors based upon their ability to bind to the ACGT core motif yielded a relatively diverse group of proteins, including, for example the CamV 35S promoter as-1-binding protein which exhibits DNA binding site requirements distinct from those proteins interacting with the G-box (Tabata et al., EMBO J. 10:1459-1467, 1991). Thus, in addition to defining the individual classes of bZIP proteins on the basis of their DNA binding specificity, such proteins can also be classified according to their heterodimerisation characteristics (Cao et al., Genes Dev. 5:1538-1552, 1991; Schindler et al., EMBO J. 11:1261-1273, 1992b).
Environmentally inducible promoters require the presence of two cis-acting elements, critical for promoter activity, one of which is the moderately conserved G-box (CCACGTGG) (deVetten et al., Plant Cell 4[10]:1295-1307, 1992). A mutation in one of the two elements abolishes or severely reduces the ability of the promoter to respond to environmental changes. The sequence of the second cis-acting element, positioned near the G-box, is not conserved among different environmentally-inducible promoters, but may be similar among promoters induced by the same signal. The spacing between the G-box and the second cis-acting element appears to be critical, suggesting a direct interaction between the respective binding factors (deVetten and Ferl, Int. J. Biochem. 26[9]:1055-1068, 1994; Ramachandran et al., Curr. Opin. Genet. Dev. 4[5]:642-646, 1994).
Basic helix-loop-helix zipper proteins represent an additional class of bZIP transcription factors described in the literature and includes, for example, the Myc proteins. These proteins contain two regions characteristic of transcription factors: an N-terminal transactivation domain consisting of several phosphorylation sites, and a C-terminal basic helix-loop-helix (bHLH) leucine zipper motif known to mediate dimerization and sequence specific DNA binding via three distinct domains: the leucine zipper, helix-loop-helix, and basic regions.
The Myb family of transcription factors is a group of functionally diverse transcriptional activators found in both plants and animals that is characterized by a conserved amino-terminal DNA-binding domain containing either two (in plant species) or three (in animal species) imperfect tandem repeats of approximately 50 amino acids (Rosinski and Atchley, J. Mol. Evol. 46(1):74-83, 1998; Stober-Grasser et al., Oncogene 7[3]:589-596, 1992). Comparisons between the amino acid sequences of representative plant and mammalian MYB proteins indicate that there is a greater conservation between the same repeat from different proteins, than between the R2 and R3 repeats from the same protein (Martin and Paz-Ares, Trends Genet. 13[2]:67-73, 1997). More than 100 MYB genes have been reported from Arabidopsis thaliana (Romero et al., Plant J. 14[3]:273-284, 1998), representing the largest regulatory gene family currently known in plants. DNA-binding studies have demonstrated that there are differences, but also frequent overlaps, in binding specificity among plant MYB proteins, in line with the distinct but often related functions that are beginning to be recognized for these proteins. Studies involving the eight putative base-contacting residues in MYB DNA binding domains have revealed that at least six are fully conserved in all plant MYB proteins identified to date and the remaining two are conserved in at least 80% of these proteins (Martin and Paz-Ares, Trends Genet. 13[2]:67-73, 1997). Mutational analysis involving residues that do not contact bases have indicated that the sequence-specific binding capacity of MYBs is affected and this may account for some of the differences in the DNA-binding specificity between plant MYB proteins (Solano et al., J. Biol. Chem. 272[5]:2889-2895, 1997). This large-sized gene family may contribute to the regulatory flexibility underlying the developmental and metabolic plasticity displayed by plants.
Homeotic transcription factors have, in animals, been implicated in a number of developmental processes including, for example, the control of pattern formation in insects and vertebrate embryos and the specification of cell differentiation in many tissues (Ingham, Nature 335:25-34, 1988; McGinnis and Krumlauf, Cell 68:283-302, 1992). Homeodomain secondary structures are characterized by a distinctive helix-turn-helix motif initially identified in bacterial DNA binding domains. This helix-turn-helix sequence/structure motif spans approximately 20 amino acids and is characterized by two short helices separated by a sharp 90 degree bend or turn (Harrison and Aggarwal, Ann. Rev. Biochem. 59:933-969, 1990). This helix has been shown to bind in the major groove of the DNA helix.
Plant homeobox genes have been identified in a number of plant species including Arabidopsis thaliana, maize, parsley and soybean. Expression pattern analysis of maize homeobox gene family members suggests that these transcription factors may be involved in defining specific regions in the vegetative apical meristem, potentially involved in the initiation of leaf structures (Jackson et al., Development 120:405-413, 1994). Such observations imply that the plant homeobox genes, as for the animal homeobox genes, may be involved in the determination of cell fate.
Homeodomain-zipper (HD-zip) represents an additional family of homeodomain proteins. These homeodomain-zipper proteins (HD-zip) possess both the characteristic homeodomain linked to an additional leucine zipper dimerization motif. This family includes, for example, Athb-1 and Athb-2 (Sessa et al., EMBO J. 12:3507-3517, 1993) and Athb-4 (Carabelli et al., Plant J. 4:469-479, 1993).
The LIM domain is a specialized double-zinc finger motif found in a variety of proteins, in association with domains of divergent functions, such as the homeodomain (see the sunflower pollen-specific SF3 transcription factor: Baltz et al., Plant J. 2:713-721, 1992; or forming proteins composed primarily of LIM domains: Dawid et al., Trends Genet. 14[4]:156-162, 1998). LIM domains interact specifically with other LIM domains and with many different protein domains. LIM domains are thought to function as protein interaction modules, mediating specific contacts between members of functional complexes and modulating the activity of some of the constituent proteins. Nucleic acid binding by LIM domains, while suggested by structural considerations, remains an unproven possibility. However, it is possible that together with the homeodomain, the LIM domain could bind to the regulatory regions of developmentally controlled genes, as has been proposed for the paired box, a conserved sequence motif first identified in the paired (PRD) and gooseberry (GSB) homeodomain proteins from Drosophila (Triesman et al., Genes Dev. 5:594-604, 1991). The PRD box is also able to bind DNA in the absence of the homeodomain. LIM-domain proteins can be nuclear, cytoplasmic, or can shuttle between compartments. In the animal systems, several important LIM proteins have been shown to be associated with the cytoskeleton, having a role in adhesion-plaque and actin-microfilament organization. Among nuclear LIM proteins, the LIM homeodomain proteins form a major subfamily with important functions in cell lineage determination and pattern formation during animal development.
The AP2 (APETALA2) and EREBPs (ethylene-responsive element binding proteins) are the prototypic members of a family of transcription factors unique to plants, whose distinguishing characteristic is that they contain the so-called AP2 DNA-binding domain. AP2/EREBP genes form a large multigene family, and they play a variety of roles throughout the plant life cycle: from being key regulators of several developmental processes, like floral organ identity determination or control of leaf epidermal cell identity, to forming part of the mechanisms used by plants to respond to various types of biotic and environmental stress. In Arabidopsis thaliana, the homeotic gene APETALA2 (AP2) has been shown to control three salient processes during development: (1) the specification of flower organ identity and the regulation of floral organogenesis (Jofuku et al., Plant Cell 6:1211-1225, 1994); (2) establishment of flower meristem identity (Irish and Sussex, Plant Cell 2[8]:741-753, 1990); and (3) the temporal and spatial regulation of flower homeotic gene activity (Drews et al., Cell 65[6]:991-1002, 1991). DNA sequence analysis suggests that AP2 encodes a theoretical polypeptide of 432 aa, with a distinct 68 aa repeated motif termed the AP2 domain. This domain has been shown to be essential for AP2 functions and contains within the 68 aa, an eighteen amino acid core region that is predicted to form an amphipathic xcex1-helix (Jofuku et al., Plant Cell 6:1211-1225, 1994). Ap2-like domain-containing transcription factors have been also been identified in both Arabidopsis thaliana (Okamuro et al., Proc. Natl. Acad. Sci. USA 94:7076-7081, 1997) and in tobacco with the identification of the ethylene responsive element binding proteins (EREBPs) (Ohme-Takagi and Shinshi, Plant Cell 7[2]:173-182, 1995). In Arabidopsis, these RAP2 (related to AP2) genes encode two distinct subfamilies of AP2 domain containing proteins designated AP2-like and EREBP-like (Okamuro et al., Proc. Natl. Acad. Sci. USA 94:7076-7081, 1997). In vitro DNA binding has not been shown to date using the RAP2 proteins; however, based upon the presence of two highly conserved motifs YRG and RAYD within the AP2 domain, it has been proposed that binding DNA binding occurs in a manner similar to that of AP2 proteins.
Zinc finger domains of the type Cys2His2 appear to represent the most abundant DNA binding motif in eukaryotic transcription factors, with several thousand being identified to date (Berg and Shi, Science 271[5252]:1081-1085, 1996). A structural role for zinc in transcription factors was initially proposed in 1983 for the transcription factor IIIA (TFIIIA) (Hanas et al., J Biol. Chem. 258[23]:14120-14125, 1983). The Cys2His2 Zinc finger domains are characterized by tandem arrays of sequences of C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H (where X represents a variable amino acid). Structurally, the zinc finger consists of two antiparallel xcex2 strands followed by an a helix (Lee et al., Science 245[4918]:635-637, 1989). This structural arrangement allows for the cysteine and histidine side chains to coordinate the zinc with the three other conserved residues forming the hydrophobic core adjacent to the metal coordination unit (Berg and Shi, Science 271[5252]:1081-1085, 1996). Many proteins possessing a Cys2His2 domain have been shown to interact with DNA in a sequence-specific manner. Crystal structure analysis of the mouse transcription factor Zif268 bound to a specific DNA target indicates that the zinc fingers in the protein/DNA/DNA complex reside in the major groove of the double helix and interacts with the DNA bases through amino acid side chains referred to as the contact residues (Pavletich and Pabo, Science 252[5007]:809-817, 1991). The orientations of the zinc finger domains with respect to the DNA are usually identical, with each domain contacting a contiguous 3-base pair subsite, the majority of which are directed to one stand. There are few interdomain interactions and the DNA recognition by each zinc finger appears to be largely independent of the other domains (Berg and Shi, Science 271[5252]:1081-1085, 1996).
The CCAAT-box element identified by Gelinas et al. (Nature 313[6000]:323-325, 1985) has been shown to occur between 80 bp and 300 bp from the transcription start site and may operate in either orientation, with possible cooperative interactions with multiple boxes (Tasanen et al., J Biol. Chem. 267[16]:11513-11519, 1992); or other conserved motifs (Muro et al., J. Biol. Chem. 267[18]:12767-12774, 1992; Rieping and Schoffl, Mol. Gen. Genet. 231[2]:226-232, 1992). CCAAT-box related motifs have been identified in a number of promoters in a variety of organisms including yeast (Hahn et al., Science 240[4850]317-321, 1988), rat (Maity et al., Proc. Natl. Acad. Sci. USA 87[14]:5378-5382, 1990; Vuorio et al., J. Biol. Chem. 265[36]:22480-22486, 1990); and plants (Rieping and Schoffl. Mol. Gen. Genet. 231[2]:226-232, 1992; Kehoe et al., Plant Cell 6[8]:1123-1134, 1994). In both yeast and vertebrates, a protein complex has been shown to bind to the CCAAT-motif. In yeast the complex consists of three proteins, known as HAP2, HAP3 and HAP5 (Pinkham and Guarente, Mol. Cell. Biol. 5[12]:3410-3416, 1985).
MADS box transcription factors interact with a conserved region of DNA known as the MADS box. All MADS box transcription factors contain a conserved DNA-binding/dimerization region, known as the MADS domain, which has been identified throughout the different kingdoms (Riechmann and Meyerowitz, Biol. Chem. 378[10]:1079-1101, 1997). Many of the MADS box genes isolated from plants are expressed primarily in floral meristems or floral organs, and are believed to play a role in either specifying inflorescence and floral meristem identity or in determining floral organ identity. One class of regulatory genes responsible for floral meristem identity and the pattern of meristem development includes the genes APETALA1 (AP1), APETALA2 (AP2), CAULIFLOWER (CAL), LEAFY (LFY) and AGAMOUS (AG) from Arabidopsis thaliana. Both LFY and AP1 have been shown to encode putative transcription factors. (Weigel et al., Cell 69:843-859, 1992), with AP1 and AG each encoding putative transcription factors of the MADS box domain family (Yanofsky et al., Nature 346:35-39, 1990). Mutations in the Lfy gene have been shown to result in a partial conversion of flowers into infloresence shoots.
Briefly, the present invention provides polynucleotides isolated from plants that encode transcription factors, together with polypeptides encoded by such polynucleotides. The isolated polynucleotides and polypeptides of the present invention may be usefully employed in the modification of gene expression in plants, since both tissue- and temporal-specific gene expression patterns have been shown to be governed by transcription factors during the natural development of a plant. The inventive polynucleotides and polypeptides may thus be employed in the manipulation of plant phenotypes.
In a first aspect, the present invention provides polynucleotides isolated from eucalyptus and pine which encode transcription factors, including transcription factors from the following families of regulatory proteins: bZIP, bZIP family of G-box binding factors; basic helix-loop-helix zipper (bHLH); homeotic/homeodomain/homeobox/MADS; homeodomain zipper (ZIP); LIM domain; AP2 and EREBs; zinc finger domains of type Cys2His2; CCAAT box elements; and MYB. In specific embodiments, the isolated polynucleotides of the present invention comprise a DNA sequence selected from the group consisting of: (a) sequences recited in SEQ ID NOS: 1-591, 1183-1912 and 1931-2106; (b) complements of the sequences recited in SEQ ID NOS: 1-591, 1183-1912 and 1931-2106; (c) reverse complements of the sequences recited in SEQ ID NOS: 1-591, 1183-1912 and 1931-2106; (d) reverse sequences of the sequences recited in SEQ ID NOS: 1-591, 1183-1912 and 1931-2106; and (e) sequences having either 40%, 60%, 75%, 90% or 95% identity, as defined herein, to a sequence of (a)-(d).
In a further aspect, isolated polypeptides encoded by the inventive polynucleotides are provided. In specific embodiments, such polypeptides comprise an amino acid sequence selected from the group consisting of: (a) sequences provided in SEQ ID NOS: 592-1182, 1913-1930 and 2107-2278; and (b) polypeptides comprising sequences having either 60%, 75%, 90% or 95% identity, as defined herein, to a sequence of (a).
In another aspect, the present invention provides polypeptides isolated from eucalyptus and pine which comprise transcription factor DNA-binding domains. In specific embodiments, such polypeptides comprise an amino acid sequence selected from the group consisting of: (a) sequences provided in SEQ ID NOS: 2279-2293 and 2296-2368; and (b) sequences having either 60%, 75%, 90% or 95% identity, as defined herein, to a sequence of (a).
In a further aspect, the invention provides DNA constructs comprising a polynucleotide of the present invention, either alone, in combination with one or more other polynucleotides disclosed herein, or in combination with one or more known DNA sequences, together with transformed cells comprising such constructs.
In specific embodiments, the inventive DNA constructs comprise, in the 5xe2x80x2-3xe2x80x2 direction, a gene promoter sequence; an open reading frame coding for at least a functional portion of a polypeptide encoded by an inventive polynucleotide, or a variant thereof; and a gene termination sequence. The open reading frame may be orientated in either a sense or antisense direction. DNA constructs comprising an untranslated, or non-coding, region of a polynucleotide coding for a transcription factor polypeptide of the present invention or a nucleotide sequence complementary to an untranslated region, together with a gene promoter sequence and a gene termination sequence, are also provided. Preferably, the gene promoter and termination sequences are functional in a host plant. Most preferably, the gene promoter and termination sequences are those of the original genes but others generally used in the art, such as the Cauliflower Mosaic Virus (CMV) promoter, with or without enhancers such as the Kozak sequence or Omega enhancer, and Agrobacterium tumefaciens nopalin synthase terminator may be usefully employed in the present invention. Tissue-specific promoters may be employed in order to target expression to one or more desired tissues. The DNA construct may further include a marker for the identification of transformed cells.
In yet a further aspect, transgenic cells comprising the DNA constructs of the present invention are provided, together with organisms, such as plants, comprising such transgenic cells. Fruits, seeds, derivatives, progeny, propagules and other products of such transgenic plants are also contemplated and encompassed by the present invention. As used herein, the term xe2x80x9cpropagulexe2x80x9d means any part of a plant that may be used in reproduction or propagation, sexual or asexual, including cuttings.
In yet another aspect, methods for modifying gene expression in a target organism are provided, such methods including stably incorporating into the genome of the organism a DNA construct of the present invention. In a preferred embodiment, the target organism is a plant, preferably a woody plant, more preferably selected from the group consisting of eucalyptus and pine species, and most preferably from the group consisting of Eucalyptus grandis and Pinus radiata. In a related aspect, a method for producing a target organism, such as a plant, having modified gene expression is provided, the method comprising transforming a plant cell with a DNA construct of the present invention to provide a transgenic cell and cultivating the transgenic cell under conditions conducive to regeneration and mature plant growth.
The present invention further provides methods for modifying the activity of a transcription factor in a target organism, such as a plant, comprising stably incorporating into the genome of the plant a DNA construct of the present invention. In a preferred embodiment, the target plant is a woody plant, preferably selected from the group consisting of eucalyptus and pine species, and most preferably from the group consisting of Eucalyptus grandis and Pinus radiata. 
The above-mentioned and additional features of the present invention and the manner of obtaining them will become apparent, and the invention will be best understood by reference to the following more detailed description. All references disclosed herein are hereby incorporated by reference in their entirety as if each was incorporated individually.