I. Transcription Factors
Eukaryotic transcription utilizes three different RNA polymerases. RNA polymerase I is located in the nucleolus and catalyzes the synthesis of ribosomal RNA. RNA polymerase II and III are present in the nucleoplasm. DNA dependent RNA synthesis by RNA polymerase III transcription complexes is responsible for the transcription of the genes that encode small nuclear RNAs and transfer RNA. RNA polymerase II transcribes the majority of the nuclear structural genes which typically encode proteins (type II genes).
In higher eukaryotes type II gene expression is often regulated, at least in part, at the level of transcription. A typical type II gene has one or more regulatory regions which include a promoter and one or more structural regions which is transcribed into precursor and messenger RNA. Type II genes are characterized by an upstream promoter region. Such regions are typically found between the start of transcription and 2000 bases distal to that transcriptional start site. Different combinations of sequence motifs can be associated with the upstream promoter region. These sequence motifs are recognized by sequence specific DNA binding proteins (transcription factors).
The polypeptide chains of transcription factors are usually divided into two functionally different regions, one that specifically binds to nucleic acid molecules and another that is associated with the activation of transcription. These functions are often present on different domains.
Several distinct structural elements or DNA binding domains which allow the transcription factor to bind to DNA in a sequence specific manner have been identified (Branden and Tooze, Introduction to Protein structure, Garland Publishing, Inc., New York (1990), the entirety of which is herein incorporated by reference). These binding domains often range in size from approximately 20 residues to more than 80 residues. Many DNA binding domain exhibit one or another of the following structural motifs: the helix-turn-helix motif, the zinc finger motif, and the leucine zipper motif. Other structural motifs include: the helix-loop-helix motif the pou motif and the multi-cysteine zinc finger.
Two sequence motifs or cis elements, the TATA box and the CAAT box are located within the promoter region of most type II genes. An AT-rich sequence called a TATA box is located approximately 30 nucleotides upstream from the start of transcription and is reported to play a role in positioning the start of transcription. A TATA box binding protein or TFIID factor has been identified that binds to this region (Hancock, Nucleic Acid Research 21: 2823-2830 (1993), the entirety of which is herein incorporated by reference; Gasch et al., Nature 346: 390-394 (1990), the entirety of which is herein incorporated by reference)(the TFIID factor is also referred to as the TBP/TAF factors). It has been reported that binding of TFIID to the TATA box plays a role in the assembly of other transcription factors to form a complex capable of initiating transcription (Nakajima et al., Mole. Cell. Biol. 8: 4038-4040 (1988), the entirety of which is herein incorporated by reference; Van Dyke et al., Science 241: 1335-1338 (1988), the entirety of which is herein incorporated by reference; Buratowski et al., Cell 56: 549-561 1989), the entirety of which is herein incorporated by reference).
In addition to the TATA box sequence, a CAAT box sequence is usually located approximately 75 bases upstream of the start of transcription. A CAAT box sequence binds a number of proteins, some of which are expressed in all tissues while others are expressed in a tissue specific manner (Branden and Tooze, Introduction to Protein Structure, Garland Publishing, Inc., New York (1990). One example of a CAAT box binding protein is the protein referred to as the CAAT box binding protein (C/EBP).
The G-box is a cis-acting element found within the promoters of many plant genes where it mediates expression in response to a variety of different stimuli (Schindler et al., EMBO J. 11:1275-1289 (1992), the entirety of which is herein incorporated by reference). The G-box comprises a palindromic DNA motif (CACGTG) which is composed of two identical half sites (Donald et al., EMBO J. 9:1727-1735 (1990); Izawa et al., J. Mol. Biol. 230:1131-1144 (1993) Schindler et al., Plant Cell 4:1309-1319 (1992); Schindler et al., EMBO J. 11:1275-1289 (1992); Odea et al, EMBO J. 10:1793-1991 (1991) Weisshaar et al, EMBO J. 10:1777-1786 (1991); and Zhang et al., Plant J. 4:711-716 (1993), all of which are herein incorporated by reference in their entirety). Both half sites are involved in the binding of the bZIP protein, GBF1, a member of the family Arabidopsis thaliana. The bZIP protein has been characterized in at least 19 other plant species (Erlich et al., Gene 117:169-178 (1992); Foley et al., Plant J. 3: 669-679 (1993); Guiltinan et al., Science 250:267-271 (1990); Kawata et al., Nucl. Acids Res. 20:1141 (1992); Katagiri et al., Nature 340:727-730 (1989); Odea et al., EMBO J. 10:1793-1991 (1991); Pysh et al., Plant Cell 5:227-236 (1993); Schindler et al., Plant Cell 4:1309-1319 (1992); Schmidt et al., Proc. Natl. Acad. Sci. (USA) 87:46-50 (1990); Singh et al., Plant Cell 2: 891-903 (1990); Tabata et al, EMBO J. 10: 1459-1467 (1991); Tabata et al., Science 245:965-967 (1989); Weisshaar et al., EMBO J. 10:1777-1786 (1991); Zhang et al., Plant J. 4:711-716 (1993), all of which are herein incorporated by reference in their entirety). Each of these proteins recognizes DNA sequences that share the central core sequence ACGT. bZIP transcription factors are characterized by the presence of a basic domain and a leucine zipper.
Plant bZIP proteins have been shown to bind regulatory elements from a wide variety of inducible plant genes including those regulated by cell cycle, light, UV light, drought and pathogen infections (Ehrlich et al., Gene 117: 169-178 (1992), Donald et al., EMBO J. 9:1727-1735 (1990); Guiltinan et al., Science 250:267-271 (1990); Katagiri et al., Nature 340:727-730 (1989); Oeda et al, EMBO J. 10: 1793-1991 (1991), the entirety of which is herein incorporated by reference; Tabata et al., EMBO J. 10:1459-1467 (1991); Weisshaar et al., EMBO J. 10:1777-1786 (1991); Holdworth et al., Plant Molecular Biology 29: 711-720 (1995), the entirety of which is herein incorporated by reference; Mikami et al., Mol. Gen. Genet. 248: 573-582 (1995), the entirety of which is herein incorporated by reference).
Specific transcription factors contribute to the quantitative and qualitative gene expression within a cell. The activity of a given transcription factors can effect cell physiology, metabolism, and/or the cell's ability to differentiate and communicate or associate with other cells within an organism. The regulation of the transcription of a gene may be the result of the activity of one or more transcription factors. Transcription factors are involved in the regulation of constitutive expression, inducible expression (such as expression in response to an environmental stimuli), and developmentally regulated expression.
Transcription factor gene families have been reported in plants (Martin and Paz-Ares, Trends in Genetics 13: 43-84 (1997), the entirety of which is herein incorporated by reference; Riechmann and Meyerowitz, Bio. Chem. 378: 1079-1101 (1997), the entirety of which is herein incorporated by reference). The MADS-box transcription factor family is one example of a transcription factor gene family found in plants as well as other organisms (Riechmann and Meyerowitz, Bio. Chem. 378: 1079-1101 (1997); Noda et al., Nature 369: 661-664 (1994), the entirety of which is herein incorporated by reference; Schwarz-Sommer et al., EMBO J. 11: 251-263 (1992), the entirety of which is herein incorporated by reference; Yanofsky et al., Nature 346: 35-39 (1990), the entirety of which is herein incorporated by reference; Drews et al., Cell 65: 991-1002 (1991), the entirety of which is herein incorporated by reference; Mizukami and Ma, Cell 71: 119-131 (1992), the entirety of which is herein incorporated by reference; Mandal et al, Nature 360: 273-277 (1992), the entirety of which is herein incorporated by reference; Gustafson-Brown et al, Cell 76: 131-143 (1994), the entirety of which is herein incorporated by reference; Jack et al, Cell 68: 703-716 (1992), the entirety of which is herein incorporated by reference; Goto and Meyerowitz, Genes and Development 8: 1548-1560 (1994), the entirety of which is herein incorporated by reference; Kriek and Meyerowitz, Development 122: 11-22 (1996), the entirety of which is herein incorporated by reference; Kempin et al, Science 267: 522-525 (1995), the entirety of which is herein incorporated by reference; Ma et al, Genes and Development 5: 484-495 (1991), the entirety of which is herein incorporated by reference; Flanagan et al, Plant J. 10: 343-353 (1996), the entirety of which is herein incorporated by reference; Flanagan and Ma, Plant Mol. Biol. 26: 581-595 (1994), the entirety of which is herein incorporated by reference; Huang et al, Plant Cell 8: 81-94 (1995), the entirety of which is herein incorporated by reference; Savidge et al, Plant Cell 7: 721-733 (1995), the entirety of which is herein incorporated by reference; Mandal and Yanofsky, Plant Cell 7: 1763-1771 (1995), the entirety of which is herein incorporated by reference; Roundsley et al, Plant Cell 7: 1259-1269 (1995), the entirety of which is herein incorporated by reference; Heck et al., Plant Cell 7: 1271-1282 (1995), the entirety of which is herein incorporated by reference; Perry et al., Plant Cell 8: 1977-1989 (1996), the entirety of which is herein incorporated by reference; Bradley et al., Cell 72: 85-95 (1993), the entirety of which is herein incorporated by reference; Huijser et al., EMBO J. 11: 1239-1249 (1992), the entirety of which is herein incorporated by reference; Sommer et al., EMBO J. 9: 605-613 (1990), the entirety of which is herein incorporated by reference; Trober et al., EMBO J. 11: 4693-4704 (1992), the entirety of which is herein incorporated by reference; Schwarz-Sommer et al., EMBO J. 11: 251-263 (1992), the entirety of which is herein incorporated by reference; Davies et al., EMBO J. 15: 4330-4343 (1996), the entirety of which is herein incorporated by reference; Zachgo et al., Development 121: 2861-2875 (1995), the entirety of which is herein incorporated by reference; Tsuchimoto et al., Plant Cell 5: 843-853 (1993), the entirety of which is herein incorporated by reference; Angenent et al., Plant J. 5: 33-44 (1993), the entirety of which is herein incorporated by reference; Van der Krol et al., Genes and Development 7: 1214-1228 (1993), the entirety of which is herein incorporated by reference; Angenent et al, Plant Cell 7: 505-516 (1995), the entirety of which is herein incorporated by reference; Angenent et al., Plant Cell 4: 983-993 (1992), the entirety of which is herein incorporated by reference; Angenent et al., Plant J. 5: 33-44 (1994), the entirety of which is herein incorporated by reference; Angenent et al., Plant J. 4: 101-112 (1993), the entirety of which is herein incorporated by reference; Angenent et al., Plant Cell 7: 1569-1582 (1995), the entirety of which is herein incorporated by reference; Columbo et al., Plant Cell 7: 1859-1868 (1995), the entirety of which is herein incorporated by reference ).
MADS-box transcription factors have been shown to bind to DNA and alter transcription by both induction and repression. Examples are known where MADS-box transcription factors exert their transcriptional regulation by binding and interacting individually, as homodimers or heterodimers, or through heterologous associations with non-MADS-box transcription factors. However, MADS transcription factors typically form dimers (Riechmann and Meyerowitz, Bio. Chem. 378: 1079-1101 (1997). MADS box transcription factors are defined by the signature MADS domain which is the most highly conserved portion of the protein among all the family members. In plants, additional domains (the I region, K-domain, and C-terminal region, in linear order) have been reported which are characteristic of the plant specific branch of this family.
The MADS domain is an approximately 57 amino acid domain located at or near the N-terminal portion of the MADS-box transcription factor (with approximately 260 amino acids in the total protein). This domain is highly conserved and is the most uniquely defining element of the family. For example, two homologues, APETALA1 from Arabidopsis and ZAP1 from Zea mays, show 89% identity over MADS domain. Conservation of this domain may be linked to its function as the portion of the protein that directly interacts with the target DNA binding site. The MADS domain is responsible for specifically binding DNA at A-T rich sequences referred to as CArG-boxes, whose consensus sequence has been reported as CC(A/T)6GG (Shore and Sharrocks, Eur. J. Bioiochem. 229: 1-13 (1995), the entirety of which is herein incorporated by reference).
The I domain spans approximately 30 amino acid sequence of poor sequence conservation compared to the MADS-domain. The intervening-region links the MADS domain region with the K-domain. Its length and sequence is variable and may be absent from some family members.
The K domain is an approximately 70 amino acid domain that is unique to the plant family members of the MADS-box gene superfamily. It is found in the majority of plant MADS-box genes. It has weak similarity to portions of animal keratin and is predicted to form amphipathic alpha helices which may facilitate interaction with other proteins. It has been reported that the structural conformation of this domain is a contributing constraint on conservation of this sequence. The K-domain typically exhibits less overall amino acid conservation than the MADS-domain, but between homologue genes such as APETALA1 from Arabidopsis and ZAP1 from Zea mays, this similarity can still be high (approximately 70%).
The C terminal domain, along with the I-domain, is the least conserved portions of the MADS-box gene family member in plants. Although exact functions for this approximately 90-100 amino acid domain have not been determined, there are known mutations within this region that lead to distinct developmental abnormalities in plants which indicate a role in transcriptional regulation. Conservation of this domain increases with increasing evolutionary closeness of species and homologues under comparison.
Genetic and molecular analysis have shown that transcription factors belonging to the MADS transcription factor family, at least in part, regulate diverse functions (Riechmann and Meyerowitz, Bio. Chem. 378: 1079-1101 (1997). MADS transcription factors often exert their effect in a homeotic manner (e.g. loss of AG activity (a MADS transcription factor) in Arabidopsis homeotically transforms the third and fourth whorl organs and eliminates floral determinacy) (Mena et al., Science 274: 1537-1540 (1996), the entirety of which is herein incorporated by reference). MADS transcription factors can regulate different processes. For example, the role of certain MADS transcription factors in floral development is reviewed in Riechmann and Meyerowitz, Bio. Chem. 378: 1079-1101 (1997). MADS transcription factors are also involved in the regulation of other plant processes such as phytochrome regulation (Wang et al., Plant Cell 9: 491-507 (1997), the entirety of which is herein incorporated by reference) and seed development (Colombo et al., Plant Cell 9: 703-715 (1997), the entirety of which is herein incorporated by reference).
Another family of transcription factors found in plants are MYB transcription factors. MYB transcription factors generally contain three repeats (R1, R2 and R3). The MYB DNA binding domain of plant proteins usually consists of two imperfect repeats of about 50 residues (Baranowskij et al., EMBO J. 13: 5383-5392 (1994), the entirety of which is herein incorporated by reference). MYB transcription factors exhibit a helix-turn-helix motif (Ogata et al., Cell 79: 639-648 (1994), the entirety of which is herein incorporated by reference). The DNA binding specificity of plant MYB proteins differs. For example, the maize P protein recognizes the motif [C/A]TCC[T/A]ACC similar to that bound by AmMYB305 from Antirhinum, and neither of these proteins appears to bind to the similar vertebrate MYB consensus motif (TAACNG) (Grotewold et al., Cell 76: 543-553 (1994), the entirety of which is herein incorporated by reference; Solano et al., EMBO J. 14: 1773-1784 (1995), the entirety of which is herein incorporated by reference). Small changes in the amino acid sequence of a MYB transcription factor can alter the DNA binding properties of that transcription factor. For example, PMYB3 from Petunia binds to two sequences, MBSI (TAAC[C/G] GTT) and MBSII (TAACTAAG) (Solano et al., EMBO J. 14: 1773-1784 (1995)). In the case of PMYB3, it has been shown that a substitution of a single residue in the R2 recognition helix switches the dual DNA-binding specificity to that of c-MYB, and the reciprocal substitution in c-MYB gives dual DNA-binding specificity similar to PhMYB3.
Mutations in residues that do not contact bases may also effect sequence-specific binding and have been reported to account for some of the differences in DNA-binding specificity between plant MYB proteins (Suzuki, Proc Jap. Acad. Series B 71: 27-31 (1995), the entirety of which is herein incorporated by reference). Of the eight putative base-contacting residues in MYB proteins, six are fully conserved in all plant MYB proteins, and the remaining two are conserved in at least 80% of these proteins. Nonetheless MYB transcription factors exhibit different nucleic acid sequence specificities and different strengths of contacts (Solano et al., Plant J. 8: 673-682 (1995), the entirety of which is herein incorporated by reference). In addition, temporal patterns of accumulation of RNA of different plant MYB genes may be effected by environmental stimuli, such as light, salt stress or the plant hormones, gibberellic acid and abscisic acid (Urao et al., Plant Cell 5: 1529-1539 (1993); Jackson et al., Plant Cell 3: 115-125 (1991), the entirety of which is herein incorporated by reference; Cone et al., Plant Cell 5: 1795-1805 (1993), the entirety of which is herein incorporated by reference; Noda et al., Nature 369: 661-664 (1994); Larkin et al., Plant Cell 5: 1739-1748 (1993), the entirety of which is herein incorporated by reference; Gubler et al., Plant Cell 7: 1879-1891 (1995), the entirety of which is herein incorporated by reference; Hattari et al., Genes Dev. 6: 609-618 (1992), the entirety of which is herein incorporated by reference).
In plants distinct functions for different MYB transcription factors have been reported including controlling secondary metabolism, regulation of cellular morphogenesis and the signal transduction pathways. MYB proteins are reported to play a role in the control of phenylpropanoid metabolism. Phenylpropanoid metabolism is one of the three main types of secondary metabolism in plants involving modification of compounds derived initially from phenylalanine. Through one branch (flavonoid metabolism) it is responsible for the production of a majority group of plant pigments (the anthocyanins) and other minor groups (aurones and phlobaphenes) and it also produces compounds that modify pigmentation through chemical interaction with the anthocyanins (co-pigmentation), such as the flavones and flavonols. Flavones and flavonols also serve to absorb ultraviolet light to protect plants. Several flavanoids act as signalling molecules in legumes inducing gene expression in symbiotic bacteria in a species-specific manner, and others act as factors required for pollen maturation and pollen germination in some plant species. A number of flavanoids and related phenylpropanoids (such as stilbenes) also act as defensive agents (phytoallexins) against biotic and abiotic stresses in particular plant species. Another branch of phenylpropanoid metabolism produces the precursors for production of lignin, the strengthening and waterproofing material of plant vascular tissue and one of the principal components of wood. This branch also produces other soluble phenolics, which can serve as signalling molecules, cell-wall crosslinking agents and antioxidants.
The C1 transcription factor (a MYB transcription factor) activates transcription of genes encoding enzymes involved in the biosynthesis of the anthocyanin pigments in the outer layer of cells of the maize seed endosperm (the aleurone)(Paz-Ares et al., EMBO J. 5: 829-833 (1986) Cone et al., Proc. Natl. Acad Sci. (U.S.A.) 83: 9631-9635 (1986), both of which are herein incorporated by reference in their entirety). Activation has been reported for at least five genes in the pathway to anthocyanin. Activation by C1 involves a partner transcriptional activator found in aleurone, a protein similar to a MYB transcription factor. These proteins also interact with other members of the R-protein family to regulate anthocyanin biosynthetic gene expression (Cone et al., Plant Cell 5: 1795-1805 (1993)). For example, in maize, another MYB protein, ZmMYB1, can activate one of the structural genes required for anthocyanin production (Franken et al., Plant J. 6: 21-30 (1994), the entirety of which is herein incorporated by reference), while yet another, ZmMYB38, inhibits C1-mediated activation of the same promoter.
Reiteration of MYB-gene function reportedly occurs in the control of a branch of flavonoid metabolism producing the red phlobaphene pigments from intermediates in flavonoid metabolism. This pathway is under control of the P gene in maize, which encodes a MYB-related protein (Grotewold et al., Cell 76: 543-553 (1994)). The P gene product activates a subset of the genes involved in anthocyanin biosynthesis. The P-binding site is contained within the promoters of these target genes (Li and Parish, Plant J. 8: 963-972 (1995), the entirety of which is herein incorporated by reference). In maize, at least two different MYB proteins serve to direct flavonoid metabolism along different routes by selective activation of target genes.
In other plant species MYB proteins can serve similar roles in the control of phenylpropanoid metabolism as, for example, in Petunia flowers. MYB proteins can also serve to regulate other branches of phenylpropanoid metabolism. In Antirrhinum majus and tobacco AmMYB305 (or its homologue in tobacco) can activate the gene encoding the first enzyme of phenylpropanoid metabolism, phenylalanine ammonia lyase (PAL (Urao et al., Plant Cell 5: 1529-1539 (1993)). Some MYB genes have been shown to be highly expressed in tissues such as differentiating xylem and may act to influence the branch of phenylpropanoid metabolism involved in lignin production (Campbell et al., Plant Physiol. 108 (Suppl.), 28 (1995), the entirety of which is herein incorporated by reference).
A second reported role for plant MYB genes is in the control of cell shape. For example, the MIXTA gene of Antirrhinum and the homologue PhMYB1 gene from Petunia have been shown to play a role in the development of the conical form of petal epidermal cells and the GL1 gene of Arabidopsis has been shown to be essential for the differentiation of hair cells (trichomes) in some parts of the leaf and in the stem (Noda et al., Nature 369: 661-664 (1994); Oppenheimer et al., Cell 67: 483-493 (1991), the entirety of which is herein incorporated by reference; Mur, PhD Thesis, Vrije Univ. of Amsterdam (1995), the entirety of which is herein incorporated by reference). Overexpression of MIXTA in transgenic tobacco results in trichome formation on pedals, suggesting that conical petal cells might be ‘trichoblasts’ arrested at an early stage in trichome formation.
GLI of Arabidopsis is associated with the expansion in the size of the cell that develops into the trichome, and it acts upstream of a number of other genes (Huilskamp et al., Cell 76: 555-566 (1994), the entirety of which is herein incorporated by reference). GLI mutants can exhibit cellular outgrowths that do not develop into full branched trichomes. GL2 of Arabidopsis encodes a homeodomain protein that is associated with chome development (Rerie et al., Genes Dev. 8: 1388-1399 (1994), the entirety of which is herein incorporated by reference). The GL2 gene promoter contains motifs very similar to the binding sites of P and AmMYB305 transcription factors (Rerie et al., Genes Dev. 8: 1388-1399 (1994)).
The conical cells produced by the action of the MIXTA gene of Antirrhinum resemble the limited outgrowths produced in Arabidopsis g12 mutants where trichome formation is aborted. In its regulation of trichome formation, GL1 interacts with the product of the TTG gene, which is required for trichome formation and anthocyanin production (Lloyd et al., Science 258: 1773-1775 (1992), the entirety of which is herein incorporated by reference). Expression of the maize R gene complements the ttg mutation and it has been reported that the TTG gene product is also a R-related protein that interacts with GL1 in a matter analogous to the interaction of C1 and R in maize (Lloyd et al., Science 258: 1773-1775 (1992)).
A further reported role for plant MYB proteins is in hormonal responses during seed development and germination. A barley MYB protein (GAMY) whose expression is induced by gibberellic acid (GA) has been shown to activate expression cf a gene encoding a high pI α-amylase that is synthesized in barley aleurone upon germination for the mobilization of starch in the endosperm (Larkin et al., Plant Cell 5: 1739-1748 (1993). Expression of GAMYB is induced by treatment of aleurone layers with GA and expression of the α-amylase gene is induced subsequently. There is a suggestion that other GA-inducible genes can also respond to activation by MYB proteins during seed germination because MYB-like motifs from other GA-responsive gene promoters have been shown to direct reporter gene expression in response to GA (Larkin et al., Plant Cell 5: 1739-1748 (1993)). In addition, some MYB genes are expressed in response to GA treatment of Petunia petals (Mur, Ph.D.Thesis, Vrije Univ. of Amsterdam (1995)).
Treatment with another plant hormone, abscisic acid (ABA), induces expression of AtMYB2 in Arabidopsis, a MYB gene that is also induced in response to dehydration or salt stress (Shinozaki et al., Plant Mol. 19: 439-499 (1992), the entirety of which is herein incorporated by reference). In maize, expression of the C1 gene is also ABA-responsive, where it is involved in the formation of anthocyanin in the developing kernels (Larkin et al., Plant Cell 5: 1739-1748 (1993)). The rd22 gene promoter contains MYC-recognition sequences suggesting that AtMYB2 can interact with a bHLH protein to induce gene transcription in response to dehydration or salt stress (Iwasaki et al., Mol. Gen. Genet. 247: 391-398 (1995), the entirety of which is herein incorporated by reference).
Plant transcription factors that fall within the helix-loop-helix class of transcription factors have been reported. These include the transcription factor encoded by the Zea mays R and B class gene (Radicella et al., Genes and Development 6: 2152-2164 (1992), the entirety of which is herein incorporated by reference). Alleles that have been identified at the b and r loci show differences in developmental or tissue specific expression.
Homeodomain transcription factors have been isolated from different plant species (Ma et al., Plant. Molec. Biol. 24: 465-473 (1994), the entirety of which is herein incorporated by reference; Muller et al., Nature 374: 727 (1995), the entirety of which is herein incorporated by reference; Lincoln et al., Plant Cell 6: 1859-1876 (1994), entirety of which is herein incorporated by reference; Hareven et al., Cell 84: 735-744 (1996), entirety of which is herein incorporated by reference; Vollbrecht et al., Nature 350: 241-243 (1991)).
The homeodomain contains three α-helices (Quain et al., Cell 59: 573-580 (1989), the entirety of which is herein incorporated by reference). Residues in helix 3 contact the major groove of a nucleic acid in a sequence specific manner. Although structurally similar, different homeodomains are able to recognize diverse binding sites (Hanes et al., Cell 57: 1275-1283 (1989), the entirety of which is herein incorporated by reference; Treisamn et al., Genes Dev. 5: 594-604 (1991), the entirety of which is herein incorporated by reference; Affolter et al., Proc. Natl. Acad. Sci. (U.S.A.) 87: 4093-4097 (1990), the entirety of which is herein incorporated by reference; Percival-Smith et al., EMBO J. 9: 3967-3974 (1990), the entirety of which is herein incorporated by reference).
One class of homeodomain transcription factors are those that share a conserved cysteine-rich motif as illustrated by the Arabidopsis GLABRA2 homeodomain protein and the Zea mays KNOTTED1 (KN1)-like proteins (Vollbrecht et al., Nature 350: 241-243 (1991), Ma et al., Plant. Molec. Biol. 24: 465-473 (1994)). The morphological mutation Knotted1 in Zea mays alters the developmental fate of cells in leaf blades with wild-type expression of the gene localized in the meristem and ground tissue but absent from leaves or leaf primordia (Hake, Trends in Genetics 8:109-114 (1992), the entirety of which is herein incorporated by reference; Freeling and Hake, Genetics 111: 617-634 (1995), the entirety of which is herein incorporated by reference). In addition to having a homeodomain, the kn1 class of genes in Zea mays encode an ELK domain which contains repeating hydrophobic residues (Kerstetter et al., Plant Cell 6: 1877-1887 (1994), the entirety of which is herein incorporated by reference).
Kn1-like homeodomain genes have been reported in other plants, such as Arabidopsis (Lincoln et al., Plant Cell 6: 1859-1876 (1994), the entirety of which is herein incorporated by reference), tomato and soybean (Ma et al., Plant Molecular Biology 24: 465-473 (1994), the entirety of which is herein incorporated by reference).
Homeodomain transcription factors have been associated with the regulation of cell to cell communication and development in plants. Presence of the KNOTTED1 homeodomain transcription factor in a plant cell can lead to an increase in plasmodesmal size permitting the transport of larger molecules between cells (Lucas et al., Science 270: 1980-1983 (1995), the entirety of which is herein incorporated by reference).
Another class of transcription factors, the polycomb-like transcription factors, have been reported in plants (Goodrich et al., Nature 386: 44-51 (1997), the entirety of which is herein incorporated by reference). Wild type CLF, a polycomb-like transcription factor, isolated from Arabidopsis, exhibits extensive structural homology with Drosphilia Pc-G genes plants (Goodrich et al., Nature 386: 44-51 (1997)). Like Drosphilia Pc-G genes, the CLF genes encodes for a SET domain and two cysteine rich regions. CLF, while not being necessary for initial specification of stamen and carpel development, is reportedly necessary to later stages of development plants and represses a second transcription factor AGAMOUS (Goodrich et al., Nature 386: 44-51 (1997); Schumacher and Magnuson, Trends in Genetics 13(5): 167-170 (1997), the entirety of which is herein incorporated by reference).
A further class of transcription factors, those containing an AP2 domain, a conserved motif first identified in Arabidopsis (a floral mutant), has been identified in a number of plants (Jofuka et al., Plant Cell 6: 1211-1225 (1994), the entirety of which is herein incorporated by reference; Weigal et al., Plant Cell 7: 388-389 (1995), the entirety of which is herein incorporated by reference). The AP2 domain, which is a DNA-binding motif of about 60 amino acid has been reported, for example, to be present in the Arabidopsis transcription factors CBF1, APETALA2, AINTEGUMENTA, and TINY; as well as the tobacco ethylene response element binding proteins (Moose and Sisco, Genes and Development 10: 3018-3027 (1996), the entirety of which is herein incorporated by reference). Weigal et al., reports a 24 amino acid AP2 consensus domain which is predicted to form an amphipathic α-helix that may mediate protein-protein interactions (Weigal et al., Plant Cell 7: 388-389 (1995)).
Mutations of transcription factors containing an AP2 domain have been to effect floral and ovule development (Meyerowitz et al., Cell 88: 299-308 (1997), the entirety of which is herein incorporated by reference). Other transcription factors from this family have been reported to play a role in cold- and dehydration-regulated gene expression (Stockinger et al., Proc. Natl. Acad Sci. (U.S.A.) 94(3): 035-1040 (1997), the entirety of which is herein incorporated by reference).
Zinc-finger proteins have been isolated from plants (Takatsuji and Matsumoto, J. Biol. Chem. 271: 23368-23373 (1996), the entirety of which is herein incorporated by reference; Messner, Plant Mol. Biol. 33: 615-624 (1997), the entirety of which is herein incorporated by reference; Dietrich et al., Cell 88: 685-694 (1997), the entirety of which is herein incorporated by reference; Pater et al., Nucleic Acid Research 24: 4624-4631 (1996), the entirety of which is herein incorporated by reference; Tague and Goodman, Plant Mole. Biol. 28: 267-279 (1995), the entirety of which is herein incorporated by reference; Putterill et al., Cell 80: 847-857 (1995), the entirety of which is herein incorporated by reference; Takatsuji et al., Plant Cell 6: 947-958 (1994), the entirety of which is herein incorporated by reference). Zinc-finger proteins have been associated with a number of processes in plants including cell death (Dietrich et al., Cell 88: 685-694 (1997)) and flower morphology (Pater et al., Nucleic Acid Research 24: 4624-4631 (1996)).
The term zinc-finger has been applied to a broad set of protein motifs. Zinc-finger transcription factors may be subdivided into a number of categories. A category of zinc-finger transcription factors referred to as the C2H2 zinc finger transcription factors (also referred to as either TFIIA or Krüpell-like zinc fingers)(Meissner and Michael, Plant Molecular Biology 33: 615-624 (1997); Takatsuji et al., EMBO J. 11: 241-249 (1994), the entirety of which is herein incorporated by reference; Tague and Goodman, Plant Mol. Biol. 28: 267-279 (1995); Takasuji et al., Plant Cell 6: 947-948 (1994), Sakamoto et al., Eur. J. Biochem. 217: 1049-1056 (1993), the entirety of which is herein incorporated by reference; Saki et al., Nature 378: 199-203 (1995), the entirety of which is herein incorporated by reference). C2H2 zinc finger transcription factors have been reported, which contain one, two or three zinc fingers. These zinc fingers are maintained by cysteine and/or histidine residues organized around a zinc metal ion (Meissner and Michael, Plant Molecular Biology 33: 615-624 (1997)).
Examples of C2H2 zinc finger transcription factors include: the petunia Epf1 product which binds to an inverted repeat found in the promoter of EPSP, the W2f1 product from wheat, which binds to a nonameric motif found in the histone H3 promoter; the Arabidopsis AtZFP1 product associated with shoot development; and the Arabidopsis SUPERMAN product that is associated with negative regulation of B-function floral organ identity (Meissner and Michael, Plant Molecular Biology 33: 615-624 (1997); Takatsuji et al., EMBO J. 11: 241-249 (1994); Tague and Goodman, Plant Mol. Biol. 28: 267-279 (1995); Takasuji et al., Plant Cell 6: 947-948 (1994), Sakamoto et al., Eur. J. Biochem. 217: 1049-1056 (1993); Saki et al., Nature 378: 199-203 (1995)).
Another category of zinc-finger transcription factor include plant relatives of the GATA-1 transcription factor (Dietrich et al., Cell 88: 685-694 (1997); Evans and Felsenfeld Cell 58: 877-885 (1989), the entirety of which is herein incorporated by reference; Putterill et al., Cell 80: 847-857 (1995); Yanagisawa et al., Nucleic Acid Research 23: 3403-3410 (1995), the entirety of which is herein incorporated by reference; De Paolis et al., Plant J. 10: 215-224 (1996), the entirety of which is herein incorporated by reference; Lippuner et al., J. Biol. Chem. 271: 12859-12866 (1996), the entirety of which is herein incorporated by reference). GATA-1 like transcription factors have been associated with, for example, the regulation of cell death and the regulation of expression associated with salt stress.
II. Expresses Sequence Tag Nucleic Acid Molecules
Expressed sequence tags, or ESTs are randomly sequenced members of a cDNA library (or complementary DNA)(McCombie et al., Nature Genetics 1:124-130 (1992); Kurata et al., Nature Genetics 8:365-372 (1994); Okubo et al., Nature Genetics 2:173-179 (1992), all of which references are incorporated herein in their entirety). The randomly selected clones comprise insets that can represent a copy of up to the full length of a mRNA transcript.
Using conventional methodologies, cDNA libraries can be constructed from the mRNA (messenger RNA) of a given tissue or organism using poly dT primers and reverse transcriptase (Efstratiadis et al., Cell 7:279-3680 (1976), the entirety of which is herein incorporated by reference; Higuchi et al., Proc. Natl. Acad. Sci. (U.S.A.) 73:3146-3150 (1976), the entirety of which is herein incorporated by reference; Maniatis et al., Cell 8:163-182 (1976) the entirety of which is herein incorporated by reference; Land et al., Nucleic Acids Res. 9:2251-2266 (1981), the entirety of which is herein incorporated by reference; Okayama et al., Mol. Cell. Biol. 2:161-170 (1982), the entirety of which is herein incorporated by reference; Gubler et al., Gene 25:263-269 (1983), the entirety of which is herein incorporated by reference).
Several methods may be employed to obtain full-length cDNA constructs. For example, terminal transferase can be used to add homopolymeric tails of dC residues to the free 3′ hydroxyl groups (Land et al., Nucleic Acids Res. 9:2251-2266 (1981), the entirety of which is herein incorporated by reference). This tail can then be hybridized by a poly dG oligo which can act as a primer for the synthesis of full length second strand cDNA. Okayama and Berg, Mol. Cell. Biol. 2:161-170 (1982), the entirety of which is herein incorporated by reference, report a method for obtaining full length cDNA constructs. This method has been simplified by using synthetic primer-adapters that have both homopolymeric tails for priming the synthesis of the first and second strands and restriction sites for cloning into plasmids (Coleclough et al., Gene 34:305-314 (1985), the entirety of which is herein incorporated by reference) and bacteriophage vectors (Krawinkel et al., Nucleic Acids Res. 14:1913 (1986), the entirety of which is herein incorporated by reference; Han et al., Nucleic Acids Res. 15:6304 (1987), the entirety of which is herein incorporated by reference).
These strategies have been coupled with additional strategies for isolating rare mRNA populations. For example, a typical mammalian cell contains between 10,000 and 30,000 different mRNA sequences (Davidson, Gene Activity in Early Development, 2nd ed., Academic Press, New York (1976), the entirety of which is herein incorporated by reference). The number of clones required to achieve a given probability that a low-abundance mRNA will be present in a cDNA library is N=(ln(1−P))/(ln(1−1/n)) where N is the number of clones required, P is the probability desired and 1/n is the fractional proportion of the total mRNA that is represented by a single rare mRNA (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory Press (1989), the entirety of which is herein incorporated by reference).
A method to enrich preparations of mRNA for sequences of interest is to fractionate by size. One such method is to fractionate by electrophoresis through an agarose gel (Pennica et al., Nature 301:214-221 (1983), the entirety of which is herein incorporated by reference). Another such method employs sucrose gradient centrifugation in the presence of an agent, such as methylmercuric hydroxide, that denatures secondary structure in RNA (Schweinfest et al., Proc. Natl. Acad. Sci. (U.S.A.) 79:4997-5000 (1982), the entirety of which is herein incorporated by reference).
A frequently adopted method is to construct equalized or normalized cDNA libraries (Ko, Nucleic Acids Res. 18:5705-5711 (1990), the entirety of which is herein incorporated by reference; Patanjali et al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1943-1947 (1991), the entirety of which is herein incorporated by reference). Typically, the cDNA population is normalized by subtractive hybridization (Schmid et al., J. Neurochem. 48:307-312 (1987), the entirety of which is herein incorporated by reference; Fargnoli et al., Anal. Biochem. 187:364-373 (1990), the entirety of which is herein incorporated by reference; Travis et al., Proc. Natl. Acad. Sci (U.S.A.) 85:1696-1700 (1988), the entirety of which is herein incorporated by reference; Kato, Eur. J. Neurosci. 2:704-711 (1990); and Schweinfest et al., Genet. Anal. Tech. Appl. 7:64-70 (1990), the entirety of which is herein incorporated by reference). Subtraction represents another method for reducing the population of certain sequences in the cDNA library (Swaroop et al., Nucleic Acids Res. 19:1954 (1991), the entirety of which is herein incorporated by reference).
ESTs can be sequenced by a number of methods. Two basic methods may be used for DNA sequencing, the chain termination method of Sanger et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:5463-5467 (1977), the entirety of which is herein incorporated by reference and the chemical degradation method of Maxam and Gilbert, Proc. Nat. Acad. Sci. (U.S.A.) 74:560-564 (1977), the entirety of which is herein incorporated by reference. Automation and advances in technology such as the replacement of radioisotopes with fluorescence-based sequencing have reduced the effort required to sequence DNA (Craxton, Methods 2:20-26 (1991), the entirety of which is herein incorporated by reference; Ju et al., Proc. Natl. Acad. Sci. (U.S.A.) 92:4347-4351 (1995), the entirety of which is herein incorporated by reference; Tabor and Richardson, Proc. Natl. Acad. Sci. (U.S.A.) 92:6339-6343 (1995), the entirety of which is herein incorporated by reference). Automated sequencers are available from, for example, Pharmacia Biotech, Inc., Piscataway, N.J. (Pharmacia ALF), LI-COR, Inc., Lincoln, Nebr. (LI-COR 4,000) and Millipore, Bedford, Mass. (Millipore BaseStation).
In addition, advances in capillary gel electrophoresis have also reduced the effort required to sequence DNA and such advances provide a rapid high resolution approach for sequencing DNA samples (Swerdlow and Gesteland, Nucleic Acids Res. 18:1415-1419 (1990); Smith, Nature 349:812-813 (1991); Luckey et al., Methods Enzymol. 218:154-172 (1993); Lu et al., J. Chromatog. A. 680:497-501 (1994); Carson et al., Anal. Chem. 65:3219-3226 (1993); Huang et al., Anal. Chem. 64:2149-2154 (1992); Kheterpal et al., Electrophoresis 17:1852-1859 (1996); Quesada and Zhang, Electrophoresis 17:1841-1851 (1996); Baba, Yakugaku Zasshi 117:265-281 (1997), all of which are herein incorporated by reference in their entirety).
ESTs longer than 150 nucleotides have been found to be useful for similarity searches and mapping (Adams et al., Science 252:1651-1656 (1991), herein incorporated by reference). ESTs, which can represent copies of up to the full length transcript, may be partially or completely sequenced. Between 150-450 nucleotides of sequence information is usually generated as this is the length of sequence information that is routinely and reliably produced using single run sequence data. Typically, only single run sequence data is obtained from the cDNA library (Adams et al., Science 252:1651-1656 (1991). Automated single run sequencing typically results in an approximately 2-3% error or base ambiguity rate (Boguski et al., Nature Genetics 4:332-333 (1993), the entirety of which is herein incorporated by reference).
EST databases have been constructed or partially constructed from, for example, C. elegans (McCombrie et al., Nature Genetics 1:124-131 (1992)), human liver cell line HepG2 (Okubo et al., Nature Genetics 2:173-179 (1992)), human brain RNA (Adams et al., Science 252:1651-1656 (1991); Adams et al., Nature 355:632-635 (1992)), Arabidopsis, (Newman et al., Plant Physiol. 106:1241-1255 (1994)); and rice (Kurata et al., Nature Genetics 8:365-372 (1994)).
III. Sequence Comparisons
A characteristic feature of a DNA sequence is that it can be compared with other DNA sequences. Sequence comparisons can be undertaken by determining the similarity of the test or query sequence with sequences in publicly available or proprietary databases (“similarity analysis”) or by searching for certain motifs (“intrinsic sequence analysis”)(e.g. cis elements)(Coulson, Trends in Biotechnology 12:76-80 (1994), the entirety of which is herein incorporated by reference); Birren et al., Genome Analysis 1: Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 543-559 (1997), the entirety of which is herein incorporated by reference).
Similarity analysis includes database search and alignment. Examples of public databases include the DNA Database of Japan (DDBJ)(on the world wide web at ddbj.nig.ac.jp/); Genebank (on the world wide web at ncbi.nlm.nih.gov/Web/Search/Index.htlm); and the European Molecular Biology Laboratory Nucleic Acid Sequence Database (EMBL) (on the world wide web at ebi.ac.uk/ebi_docs/embl_db/embl-db.html). Other appropriate databases include dbEST (on the world wide web at ncbi.nlm.nih.gov/dbEST/index.html), SwissProt (on the world wide web at ebi.ac.uk/ebi_docs/swisprot_db/swisshome.html), PIR (on the world wide web at nbrt.georgetown.edu/pir/) and The Institute for Genome Research (on the world wide web at tigr.org/tdb/tdb.html).
A number of different search algorithms have been developed, one example of which are the suite of programs referred to as BLAST programs. There are five implementations of BLAST, three designed for nucleotide sequences queries (BLASTN, BLASTX and TBLASTX) and two designed for protein sequence queries (BLASTP and TBLASTN) (Coulson, Trends in Biotechnology 12:76-80 (1994); Birren et al., Genome Analysis 1, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 543-559 (1997)).
BLASTN takes a nucleotide sequence (the query sequence) and its reverse complement and searches them against a nucleotide sequence database. BLASTN was designed for speed, not maximum sensitivity and may not find distantly related coding sequences. BLASTX takes a nucleotide sequence, translates it in three forward reading frames and three reverse complement reading frames and then compares the six translations against a protein sequence database. BLASTX is useful for sensitive analysis of preliminary (single-pass) sequence data and is tolerant of sequencing errors (Gish and States, Nature Genetics 3.266-272 (1993), the entirety of which is herein incorporated by reference). BLASTN and BLASTX may be used in concert for analyzing EST data (Coulson, Trends in Biotechnology 12:76-80 (1994); Birren et al., Genome Analysis 1:543-559 (1997)).
Given a coding nucleotide sequence and the protein it encodes, it is often preferable to use the protein as the query sequence to search a database because of the greatly increased sensitivity to detect more subtle relationships. This is due to the larger alphabet of proteins (20 amino acids) compared with the alphabet of nucleic acid sequences (4 bases), where it is far easier to obtain a match by chance. In addition, with nucleotide alignments, only a match (positive score) or a mismatch (negative score) is obtained, but with proteins, the presence of conservative amino acid substitutions can be taken into account. Here, a mismatch may yield a positive score if the non-identical residue has physical/chemical properties similar to the one it replaced. Various scoring matrices are used to supply the substitution scores of all possible amino acid pairs. A general purpose scoring system is the BLOSUM62 matrix (Henikoff and Henikoff, Proteins 17:49-61 (1993), the entirety of which is herein incorporated by reference), which is currently the default choice for BLAST programs. BLOSUM62 is tailored for alignments of moderately diverged sequences and thus may not yield the best results under all conditions. Altschul, J. Mol. Biol. 36:290-300 (1993), the entirety of which is herein incorporated by reference, describes a combination of three matrices to cover all contingencies. This may improve sensitivity, but at the expense of slower searches. In practice, a single BLOSUM62 matrix is often used but others (PAM40 and PAM250) may be attempted when additional analysis is necessary. Low PAM matrices are directed at detecting very strong but localized sequence similarities, whereas high PAM matrices are directed at detecting long but weak alignments between very distantly related sequences.
Homologues in other organisms are available that can be used for comparative sequence analysis. Multiple alignments are performed to study similarities and differences in a group of related sequences. CLUSTAL W is a multiple sequence alignment package that performs progressive multiple sequence alignments based on the method of Feng and Doolittle, J. Mol. Evol. 25:351-360 (1987), the entirety of which is herein incorporated by reference. Each pair of sequences is aligned and the distance between each pair is calculated; from this distance matrix, a guide tree is calculated and all of the sequences are progressively aligned based on this tree. A feature of the program is its sensitivity to the effect of gaps on the alignment; gap penalties are varied to encourage the insertion of gaps in probable loop regions instead of in the middle of structured regions. Users can specify gap penalties, choose between a number of scoring matrices, or supply their own scoring matrix for both pairwise alignments and multiple alignments. CLUSTAL W for UNIX and VMS systems is available at: ftp.ebi.ac.uk. Another program is MACAW (Schuler et al., Proteins Struct. Func. Genet. 9:180-190 (1991), the entirety of which is herein incorporated by reference, for which both Macintosh and Microsoft Windows versions are available. MACAW uses a graphical interface, provides a choice of several alignment algorithms and is available by anonymous ftp at: ncbi.nlm.nih.gov (directory/pub/macaw).
Sequence motifs are derived from multiple alignments and can be used to examine individual sequences or an entire database for subtle patterns. With motifs, it is sometimes possible to detect distant relationships that may not be demonstrable based on comparisons of primary sequences alone. Currently, the largest collection of sequence motifs in the world is PROSITE (Bairoch and Bucher, Nucleic Acid Research 22:3583-3589 (1994), the entirety of which is herein incorporated by reference). PROSITE may be accessed via either the ExPASy server on the World Wide Web or anonymous ftp site. Many commercial sequence analysis packages also provide search programs that use PROSITE data.
A resource for searching protein motifs is the BLOCKS E-mail server developed by Henikoff, Trends Biochem Sci. 18:267-268 (1993), the entirety of which is herein incorporated by reference; Henikoff and Henikoff, Nucleic Acid Research 19:6565-6572 (1991), the entirety of which is herein incorporated by reference; Henikoff and Henikoff, Proteins 17:49-61 (1993). BLOCKS searches a protein or nucleotide sequence against a database of protein motifs or “blocks.” Blocks are defined as short, ungapped multiple alignments that represent highly conserved protein patterns. The blocks themselves are derived from entries in PROSITE as well as other sources. Either a protein query or a nucleotide query can be submitted to the BLOCKS server; if a nucleotide sequence is submitted, the sequence is translated in all six reading frames and motifs are sought for these conceptual translations. Once the search is completed, the server will return a ranked list of significant matches, along with an alignment of the query sequence to the matched BLOCKS entries.
Conserved protein domains can be represented by two-dimensional matrices, which measure either the frequency or probability of the occurrences of each amino acid residue and deletions or insertions in each position of the domain. This type of model, when used to search against protein databases, is sensitive and usually yields more accurate results than simple motif searches. Two popular implementations of this approach are profile searches such as GCG program ProfileSearch and Hidden Markov Models (HMMs)(Krough et al, J. Mol. Biol. 235:1501-1531, (1994); Eddy, Current Opinion in Structural Biology 6:361-365, (1996), both of which are herein incorporated by reference in their entirety). In both cases, a large number of common protein domains have been converted into profiles, as present in the PROSITE library, or HHM models, as in the Pfam protein domain library (Sonnhammer et al, Proteins 28:405-420 (1997), the entirety of which is herein incorporated by reference). Pfam contains more than 500 HMM models for enzymes, transcription factors, signal transduction molecules and structural proteins. Protein databases can be queried with these profiles or HMM models, which will identify proteins containing the domain of interest. For example, HMMSW or HMMFS, two programs in a public domain package called HMMER (Sonnhammer et al., Proteins 28:405-420 (1997)) can be used.
PROSITE and BLOCKS represent collected families of protein motifs. Thus, searching these databases entails submitting a single sequence to determine whether or not that sequence is similar to the members of an established family. Programs working in the opposite direction compare a collection of sequences with individual entries in the protein databases. An example of such a program is the Motif Search Tool, or MoST (Tatusov et al., Proc. Natl. Acad Sci. (U.S.A.) 91:12091-12095 (1994), the entirety of which is herein incorporated by reference). On the basis of an aligned set of input sequences, a weight matrix is calculated by using one of four methods (selected by the user). A weight matrix is simply a representation, position by position of how likely a particular amino acid will appear. The calculated weight matrix is then used to search the databases. To increase sensitivity, newly found sequences are added to the original data set, the weight matrix is recalculated and the search is performed again. This procedure continues until no new sequences are found.