This invention relates to nucleic acid molecules isolated from Populus species, and methods of using these molecules and derivatives thereof to produce plants, particularly trees such as Populus species, that have modified fertility characteristics.
The increasing demand for pulp and paper products and the diminishing availability of productive forest lands are being addressed in part by efforts to develop trees that produce increased yields in shorter growth periods. Many such efforts are focused on the production of transgenic trees having modified growth characteristics, such as reduced lignin content (see for example, U.S. Pat. No. 5,451,514, xe2x80x9cModification of Lignin Synthesis in Plantsxe2x80x9d), and resistance to insect, viruses and herbicides. A major concern with the production of transgenic trees is the possibility that the transgenic traits might be introduced into indigenous tree populations by cross-fertilization. Thus, for example, the introduction of genes for insect resistance into indigenous tree populations could accelerate the evolution of resistant insects, adversely affect endangered insect species and interfere with normal food chains. Because of these concerns, the U.S. and other governments have instituted regulatory review processes to assess the risks associated with proposed environmental releases of transgenic plants (both for field trials and commercial production).
Genetic engineering of sterility into trees offers the possibility of securing introduced genes in the engineered tree; trees that produce neither pollen nor seeds will not be able to transmit introduced genes by normal routes of reproduction. Additional potential benefits of engineering sterility into trees include increased wood yields and reduced production of allergens such as pollen. For a review of engineering reproductive sterility in forest trees, see Strauss et al. (1995a,b).
Two primary methods for engineering sterility have been described. In the first method, termed genetic ablation, a cytotoxic gene is expressed under the control of a reproductive tissue-specific promoter. Cytotoxic genes employed in this method to date include RNase (Mariani et al., 1990; Mariani et al., 1992; Reynarts et al., 1993; Goldman et al., 1994), ADP-ribosyl transferase (Thorsness et al., 1991; Kandasamy, 1993; Thorseness et al., 1993), the Agrobacterium RoiC gene (Schmxc3xcilling, 1993), and glucanase (Worrall et al., 1992, Paul et al., 1992). The expression of the cytotoxic gene results (ideally) in the death of all cells in which the reproductive tissue-specific promoter is active. It is therefore critical that the promoter be highly specific to the reproductive tissue to avoid pleiotropic effects on vegetative tissue. For this reason, genome position effects on the transgene need to be monitored (see Strauss et al., 1995a,b). The success of genetic ablation methods in trees will thus depend on the availability of a suitable reproductive tissue-specific promoter for the tree species in question.
The second method for engineering sterility involves inhibiting the expression of genes that are essential for reproduction. This can be accomplished in a number of ways, including the use of antisense RNA, sense suppression and promoter-based suppression. Details and applications of antisense (Kooter, 1993; Mol et al., 1994; Van der Meer et al., 1992; Pnueli et al., 1994), sense suppression (Flavell, 1994; Jorgensen, 1992; Taylor et al., 1992) and promoter-based suppression (Brusslan et al., 1993; Matzke et al., 1993) technologies in plants have been described in the scientific literature. The key to the use of any of these methods in the production of sterile trees is the identification of appropriate indigenous genes, i.e, disruption of the expression of such genes must result in the abolition of correct reproductive tissue development.
Genes specifically expressed in reproductive tissues have been isolated from a number of plant species (for a review, see Strauss et al., 1995a). Genes that have been characterized as acting early in the development of floral structures include LEAFY (LFY) from Arabidopsis (Weigel et al., 1992), APETALAI (AP1) from Arabidopsis (Mandel et al, 1992a,b), and FLORICAULA (FLO) from Antirrhinum (Coen et al., 1990), which regulate the transition from inflorescence to floral meristems. APETALA2 (AP2) appears to regulate the AGAMOUS gene (AG) which plays a role in differentiation of male and female floral tissues (see Okamuro et al., 1993). DEFICIENS (DEF) is a floral homeotic gene from Antirrhinum that is expressed throughout flower development (Schwarz-Sommer et al. 1992).
The majority of floral homeotic genes are members of the MADS-box family of transcription factors (Yanofsky et al., 1990). The MADS-box is a conserved region of approximately 60 amino acid residues. MADS is an acronym for the first four known genes in which the MADS-box was identified: yeast minichromosomal maintenance factor (MCM1), the floral homeotic genes AG and DEF, and human serum response factor (SRF). Plant MADS-box genes contain four domains: the highly conserved MADS-box region located near or at the 5xe2x80x2 end of the translated region in plant genes; the L or linker region between the MADS and K domains; the K domain, a moderately conserved keratin-like region predicted to form amphipathic xcex1-helices; and a highly variable carboxy-terminal region. The K-box is only present in plant MADS-box genes. It is thought to be involved in protein-protein interactions (Pnueli et al., 1991).
Studies have shown that the organization of the MADS domain in plants is similar to that in SRF; the basic N-terminal portion of the domain is required for DNA-binding and the C-terminal half of the box is required for dimerization. Because MADS proteins bind DNA as dimers, the MADS box as well as a C-terminal extension that is involved in dimerization are required for DNA-binding. The C-terminal extension varies throughout the gene family. C-terminal deletions indicate that the minimal DNA-binding domain of AP1 and AG includes the MADS-box and part of the L region, whereas AP3 and PI require a portion of the K box in addition to the MADS and L regions (Riechmann et al., 1996). The difference in the sizes of the minimal binding domains is thought to reflect the dimerization characteristics of the respective proteins: AP1 and AG bind DNA as homodimers whereas AP3/PI and their Antirrhinum homologs DEF/GLO bind as heterodimers.
MADS-box proteins have been found to bind to a motif found in target gene promoters referred to as the CArG-box. CArG-box motifs are also found in the promoters of MADS-box genes, where they are thought to be targets for auto-regulation. Riechmann et al. (1996) used circular permutation and phasing analysis to detect conformational changes in DNA that resulted from MADS-box protein binding (Reichmann et al., 1996). They found that bound AP1, AP3/PI, and AG all induce DNA bending oriented toward the minor groove. For a review of MADS box biology, see Ma, 1994; Purugganan et al., 1995; and Yanofsky, 1995. AG and DEF have been characterized as MADS box genes; while FLO and LFY appear to encode transcription factors and have proline-rich and acidic domains, they are not MADS box genes.
Following a functional analyses of MADS box genes, Mizukami et al. (1996) created deletion mutants of AG in which various domains of the gene, including the MADS and K boxes were deleted. Based on their results, they proposed that dominant negative mutations of MADS box genes could be created by deleting the all or part of the MADS domain, or by deleting all or part of the K domain or by deleting various portions of the 3xe2x80x2 region of the AG open reading frame. It was proposed that the proteins encoded by these deletion mutants would be able to bind either the target DNA (i.e., the nucleotide sequence to which the transcription factor binds) or the protein co-factors required for transcription, but not both. Thus, it was proposed that such mutant proteins would interfere with the functioning of the coexisting corresponding endogenous gene. The studies of floral homeotic genes discussed in the preceding paragraphs have been primarily undertaken in model plants such as Arabidopsis and Antirrhinum; few, if any, studies have addressed the genetics of flowering in tree species at the molecular level.
Species of the genus Populus are becoming increasingly important in the forestry industry, particularly for pulp and paper production, in part because of their fast growth characteristics. This group includes aspens (species of Populus section Leuce and their hybrids), and hybrids between black cottonwood (P. trichocarpa Torr. and Gray, also classified as P. balsamifera subsp. trichocarpa; Brayshaw, 1965) and eastern cottonwood (P. deltoides L.). These species are also well suited to manipulation by genetic engineering because they are fast-growing, have relatively small genomes, are easy to regenerate in vitro, and are susceptible to transformation with Agrobacterium. To date however, relatively few genes have been cloned from these species. Notably, the genetic basis underlying floral development in these species is alnost completely uncharacterized.
Floral development in the genus Populus is significantly different from what is seen in a typical hermaphroditic annual (Nagaraj, 1952; Boes and Strauss, 1994). The apices of the branches do not become inflorescences. The flowers are borne on axillary inflorescences, or catkins, with male and female flowers found on separate trees, although occasionally mixed inflorescences or hermaphroditic flowers are seen. The inflorescences appear from dormant buds in the spring, usually occurring from about five years of age. Instead of the usual structure of four concentric whorls of organs (sepals outermost, followed by petals, then stamens surrounding one or more carpels in the center), the Populus flower apparently has only two whorls (a reduced perianth cup surrounding either stamens or carpels). Unlike several other species that produce unisexual flowers through developmental arrest or degeneration of one set of organs (Cheng et al., 1983; Grant et al., 1994), Populus does not initiate male organs in female flowers or vice versa (Boes and Strauss, 1994; Sheppard, 1997). After releasing pollen or seeds, the entire inflorescences are shed (Kaul, 1995). By late spring, the inflorescence buds for the next year""s flowers have already been initiated in the axils of the current year""s leaves, and will develop for several more months before going dormant.
The availability of genes that control floral development in Populus species would permit the production of genetically engineered sterile trees. In turn, the ability to control fertility of Populus trees in this way would be of great value in environmental and biosafety of Populus trees engineered for improved agronomic characteristics. It is to such genes that the present invention is directed.
The present invention provides four floral homeotic genes from Populus trichocarpa. The four genes are herein termed PTLF, PTD, PTAG-1 and PTAG-2. These genes are homologs of floral homeotic genes isolated from other plant species. Specifically, PTLF is a homolog of LEAFY (LFY) and FLORICAULA (FLO), PTD is a homolog of DEFICIENS (DEF) and PTAG-1 and PTAG-2 are homologs of AGAMOUS (AG). The Populus genes are shown to be expressed in floral tissues; for example, PTLF is expressed in immature inflorescences on which floral promordia are developing, whereas PTD is expressed strongly in stamen primordia from the onset of organogenesis. PTD is also expressed at low levels in carpel primordia.
The invention provides the nucleic acid sequences of these four Populus genes, the corresponding cDNA sequences and the deduced amino acid sequences of the encoded polypeptides. Along with these sequences, the present invention also provides methods of using the gene and cDNA sequences to produce genetically engineered Populus species and other trees having modified fertility characteristics, including sterility.
Genetic constructs useful in producing genetically engineered Populus and other trees include antisense versions of PTLF, PTD, PTAG-1 and PTAG-2, dominant negative mutants of these genes, and constructs useful for sense suppression. In addition, the promoter sequences of these genes may be used to obtain floral-specific expression of genes such as cytotoxins that may be employed in genetic ablation strategies to produce trees having modified fertility characteristics, including sterility.
In one aspect, the invention provides isolated nucleic acid molecules comprising portions of the disclosed nucleic acid sequences. Such molecules comprise at least 15 consecutive nucleotides of the disclosed PTLF, PTD, PTAG-1 or PTAG-2 nucleic acid sequences, and may be longer, comprising at least 20, 25, 50, or 100 consecutive nucleotides of these sequences. Such molecules are useful, among other things, as primers and probes for amplifying all or parts of the disclosed sequences and for detecting the expression of the nucleic acid molecules in cells, such as cells of transgenic plants. Thus, in one aspect, such molecules are useful to monitor the expression of transgenes comprising some portion of the PTD, PTLF, PTAG-1 or PTAG-2 molecules.
Modification of the fertility traits of plants, such as Populus species may also be obtained by introducing genetic constructs containing variants of all or portions of the disclosed PTD, PTLF, PTAG-1 or PTAG-2 sequences. Such variants are provided by the invention and may comprise a nucleotide sequence of at least 50 (or, for example, at least 100) nucleotides in length which sequence hybridizes under stringent conditions to the disclosed nucleic acid sequences. Alternatively, such variants may share a specified percentage of sequence identity with the disclosed nucleic acid sequences (e.g., at least 75% or at least 90% sequence identity) as determined using a specified sequence alignment program.
The disclosed nucleic acid molecules and variant forms of these molecules may be assembled in nucleic acid vectors for introduction into cells, such as plant cells. Thus, another aspect of the invention comprises the disclosed nucleic acid molecules and variants thereof, and vectors comprising these molecules.
In another embodiment, the invention provides transgenic plants comprising the vectors. Such transgenic plants may have altered phenotypes (compared to non-transgenic plants of the same species) including modified fertility characteristics. Modified fertility characteristics include modifications in the timing of flowering, for example, advancing the timing of flowering relative to non-transgenic plants of the same species, and sterility. Sterility may be complete sterility, or may be male only or female only sterility. Examples of transgenic plants provided by the present invention include genetically engineered sterile Populus and Eucalyptus species.
In another embodiment, the invention provides transgenic plants that comprise a recombinant expression cassette, wherein the recombinant expression cassette comprises a promoter sequence operably linked to a first nucleic acid sequence, and wherein the first nucleic acid sequence comprises all or part of one of the disclosed nucleic acid molecules, or a variant of one of the disclosed nucleic acid molecules. By way of example, such transgenic plants include plants in which the first nucleic acid is arranged in reverse orientation to the promoter sequence in the recombinant expression cassette, such that an antisense RNA is produced. In another example, such transgenic plants include plants in which the first nucleic acid is a dominant negative mutant of PTD, PTLF, PTAG-1 or PTAG-2, produced by deletion of part of the coding region, such as the 3xe2x80x2 portion of the open reading frame, or all or part of a MADS or K-box region of the coding region. In other embodiments, the promoter sequence driving expression of the first nucleic acid may be a promoter that confers enhanced expression of the first nucleic acid molecule in floral tissues of the plant relative to non-floral tissues.
In other embodiments, the expression of at least one endogenous gene in transgenic plants containing such a recombinant expression cassette will be modified as a result of the cassette. In particular embodiments, that modified expression will affect the fertility of the plant, and will render the plant sterile.
In yet other embodiments, the invention provides transgenic plants comprising a recombinant expression cassette, wherein the recombinant expression cassette comprises a promoter sequence operably linked to a first nucleic acid sequence, and wherein the promoter sequence is a promoter sequence from PTD, PTLF, PTAG-1 or PTAG-2. In particular embodiments, the first nucleic acid sequence encodes a cytotoxic polypeptide.
These and other aspects of the invention are described in more detail below.
The nucleic and amino acid sequences listed in the accompanying Sequence Listing are showed using standard letter abbreviations for nucleotide bases, and three letter code for amino acids. Only one strand of each nucleic acid sequence is shown, but the complementary strand is understood to be included by any reference to the displayed strand.
Seq. I.D. No. 1 shows the nucleic acid sequence of the PTD gene. The sequence comprises the following regions:
Seq. I.D. No. 2 shows the nucleic acid sequence of the PTD cDNA.
Seq. I.D. No. 3 shows the nucleic acid sequence of the PTD ORF.
Seq. I.D. No. 4 shows the amino acid sequence of the PTD polypeptide. The sequence comprises the following regions:
Seq. I.D. No. 5 shows the nucleic acid sequence of the PTLF gene. The sequence comprises the following regions:
Seq. I.D. No. 6 shows the nucleic acid sequence of the PTLF cDNA.
Seq. I.D. No. 7 shows the nucleic acid sequence of the PTLF ORF.
Seq. I.D. No. 8 shows the amino acid sequence of the PTLF polypeptide.
Seq. I.D. No. 9 shows the nucleic acid sequence of the PTAG-1 gene. The sequence comprises the following regions:
Seq. I.D. No. 10 shows the nucleic acid sequence of the PTAG-1 cDNA.
Seq. I.D. No. 11 shows the nucleic acid sequence of the PTAG-1 ORF.
Seq. I.D. No. 12 shows the amino acid sequence of the PTAG-1 polypeptide. The sequence comprises the following regions:
Seq. I.D. No. 13 shows the nucleic acid sequence of the PTAG-2 gene. The sequence comprises the following regions:
Seq. I.D. No. 14 shows the nucleic acid sequence of the PTAG-2 cDNA.
Seq. I.D. No. 15 shows the nucleic acid sequence of the PTAG-2 ORF.
Seq. I.D. No. 16 shows the amino acid sequence of the PTAG-2 polypeptide. The sequence comprises the following regions:
Seq. I.D. Nos. 17-24 show oligonucleotide primers that may be used to amplify portions of the disclosed floral homeotic nucleic acid sequences.
I. Definitions and Abbreviations
Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8).
In order to facilitate review of the various embodiments of the invention, the following definitions of terms are provided:
Isolated: An xe2x80x9cisolatedxe2x80x9d biological component (such as a nucleic acid or protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids and proteins that have been xe2x80x9cisolatedxe2x80x9d include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.
cDNA (complementary DNA): A piece of DNA lacking internal, non-coding segments (introns). cDNA is synthesized in the laboratory by reverse transcription from messenger RNA extracted from cells.
Oligonucleotide: A linear polynucleotide sequence of up to about 100 nucleotide bases in length.
Operably linked: A first nucleic acid sequence is operably linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked DNA sequences are contiguous and, where necessary to join two protein-coding regions, in the same reading frame.
ORF (open reading frame): A series of nucleotide triplets (codons) coding for amino acids without any termination codons. These sequences are usually translatable into a peptide.
Ortholog: Two nucleotide or amino acid sequences are orthologs of each other if they share a common ancestral sequence and diverged when a species carrying that ancestral sequence split into two species. Orthologous sequences are also homologous sequences.
Probes and primers: Molecules useful as nucleic acid probes and primers may readily be prepared based on the nucleic acids provided by this invention. Typically, but not necessarily, such molecules are oligonucleotides, i.e., linear nucleic acid molecules of up to about 100 nucleotides bases in length. However, longer nucleic acid molecules, up to and including the full length of a particular floral homeotic gene may also be employed for such purposes.
A nucleic acid probe comprises at least one copy (and typically many copies) of an isolated nucleic acid molecule of known sequence that is used in a nucleic acid hybridization protocol. Generally (but not always) the nucleic acid molecule is attached to a detectable label or reporter molecule. Typical labels include radioactive isotopes, ligands, chemiluminescent agents, and enzymes. Methods for labeling and guidance in the choice of labels appropriate for various purposes are discussed, e.g., in Sambrook et al. (1989) and Ausubel et al. (1987).
Primers are short nucleic acids, usually DNA oligonucleotides 8-10 nucleotides or more in length, and more typically 15-25 nucleotides in length. Primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for amplification of a nucleic acid sequence, e.g., by the polymerase chain reaction (PCR) or other nucleic-acid amplification methods known in the art.
Methods for preparing and using probes and primers are described, for example, in Sambrook et al. (1989), Ausubel et al. (1987), and Innis et al., (1990). PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, (copyright) 1991, Whitehead Institute for Biomedical Research, Cambridge, Mass.). One of skill in the art will appreciate that the specificity of a particular probe or primer increases with its length. Thus, for example, a primer comprising 20 consecutive nucleotides of the cDNA disclosed in Seq. I.D. No. 2 will anneal to a target sequence such as a homologous sequence in Eucalyptus contained within a Eucalyptus cDNA library with a higher specificity than a corresponding primer of only 15 nucleotides. Thus, in order to obtain greater specificity, probes and primers may be selected that comprise 20, 25, 30, 35, 40, 50, 75, 100 or more consecutive nucleotides of the disclosed nucleic acid sequences.
The invention thus includes isolated nucleic acid molecules that comprise specified lengths of the disclosed floral homeotic sequences. Such molecules may comprise at least 8-10, 15, 20, 25, 30, 35, 40, 50, 75, or 100 consecutive nucleotides of these sequences and may be obtained from any region of the disclosed sequences. By way of example, the floral homeotic genes shown in the Sequence Listing may be apportioned into halves or quarters based on sequence length, and the isolated nucleic acid molecules may be derived from the first or second halves of the molecules, or any of the four quarters. The PTD cDNA, shown in Seq. I.D. No. 2 may be used to illustrate this. This cDNA is 924 nucleotides in length and so may be hypothetically divided into halves (nucleotides 1-462 and 463-924) or quarters (nucleotides 1-231, 232-462, 463-693 and 694-924). Nucleic acid molecules may be selected that comprise at least 8-10, 15, 20, 25, 30, 35, 40, 50, 75 or 100 consecutive nucleotides of any of these portions of the floral homeotic genes. Thus, one such nucleic acid molecule might comprise at least 25 consecutive nucleotides of the region comprising nucleotides 1-924 of the disclosed floral homeotic genes.
Purified: The term purified does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified PTAG-1 protein preparation is one in which the PTAG-1 protein is more pure than the protein in its natural environment within a cell. Generally, a preparation of a floral homeotic protein is purified such that the floral homeotic protein represents at least 5% of the total protein content of the preparation. For particular applications, higher purity may be desired, such that preparations in which the floral homeotic protein represents at least 50% or at least 75% of the total protein content may be employed.
Recombinant: A recombinant nucleic acid is one that has a sequence that is not naturally occurring or has a sequence that is made by an artificial combination of two otherwise separated segments of sequence. This artificial combination is often accomplished by chemical synthesis or, more commonly, by the artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineering techniques.
Transformed: A transformed cell is a cell into which has been introduced a nucleic acid molecule by molecular biology techniques. As used herein, the term transformation encompasses all techniques by which a nucleic acid molecule might be introduced into such a cell, including Agrobacterium-mediated transformation, transfection with viral vectors, transformation with plasmid vectors and introduction of naked DNA by electroporation, lipofection, and particle gun acceleration.
Transgenic plant: As used herein, this term refers to a plant that contains recombinant genetic material not normally found in plants of this type and which has been introduced into the plant in question (or into progenitors of the plant) by human manipulation. Thus, a plant that is grown from a plant cell into which recombinant DNA is introduced by transformation is a transgenic plant, as are all offspring of that plant that contain the introduced transgene (whether produced sexually or asexually).
Vector: A nucleic acid molecule as introduced into a host cell, thereby producing a transformed host cell. A vector may include nucleic acid sequences that permit it to replicate in a host cell, such as an origin of replication. A vector may also include one or more selectable marker genes and other genetic elements known in the art.
Sequence identity: the relatedness of two nucleic acid sequences, or two amino acid sequences is typically expressed in terms of the identity between the sequences (in the case of amino acid sequences, similarity is an alternative assessment). Sequence identity is frequently measured in terms of percentage identity; the higher the percentage, the more similar the two sequences are. Homologs of a disclosed floral homeotic protein or nucleic acid sequence will possess a relatively high degree of sequence identity when aligned using standard methods.
Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith and Waterman (1981); Needleman and Wunsch (1970); Pearson and Lipman (1988); Higgins and Sharp (1988); Higgins and Sharp (1989); Corpet et al. (1988); Huang et al. (1992); and Pearson et al. (1994). Altschul et al. (1994) presents a detailed consideration of sequence alignment methods and homology calculations.
The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al., 1990) is available from several sources, including the National Center for Biotechnology Information (NCBI, Bethesda, Md.) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. It can be accessed at http://www.ncbi.nlm.nih.gov/BLAST/. A description of how to determine sequence identity using this program is available at http://www.ncbi.nlm.nih.gov/BLAST/blast help.html.
Homologs of the disclosed floral homeotic proteins are typically characterized by possession of at least 50% sequence identity counted over the full length alignment with the amino acid sequence of a selected floral homeotic protein using the NCBI Blast 2.0, gapped blastp set to default parameters. Proteins with even greater similarity to the reference sequences will show increasing percentage identities when assessed by this method, such as at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 90% or at least 95% sequence identity. When less than the entire sequence is being compared for sequence identity, homologs will typically possess at least 75% sequence identity over short windows of 10-20 amino acids, and may possess sequence identities of at least 85% or at least 90% or 95% depending on their similarity to the reference sequence. Methods for determining sequence identity over such short windows are described at http://www.ncbi.nlm.nih.gov/BLAST/blast FAQs.html. One of skill in the art will appreciate that these sequence identity ranges are provided for guidance only; it is entirely possible that strongly significant homologs could be obtained that fall outside of the ranges provided. The present invention provides not only the peptide homologs as described above, but also nucleic acid molecules that encode such homologs.
Homologs of the disclosed floral homeotic nucleic acids are typically characterized by possession of at least 50% sequence identity counted over the fall length alignment with the nucleic acid sequence of a selected floral homeotic gene using the NCBI Blast 2.0, blastn set to default parameters. Homologs with even greater similarity to the reference sequences will show increasing percentage identities when assessed by this method, such as at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 90% or at least 95% sequence identity.
An alternative indication that two nucleic acid molecules are closely related is that the two molecules hybridize to each other under stringent conditions. Stringent conditions are sequence dependent and are different under different environmental parameters. Generally, stringent conditions are selected to be about 5xc2x0 C. to 20xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Conditions for nucleic acid hybridization and calculation of stringencies can be found in Sambrook et al. (1989) and Tijssen (1993). Nucleic acid molecules that hybridize under stringent conditions to a disclosed nucleic acid sequences will typically hybridize to a probe corresponding to either the entire cDNA or selected portions of the cDNA under wash conditions of 0.2xc3x97SSC, 0.1% SDS at 65xc2x0 C.
Nucleic acid sequences that do not show a high degree of identity may nevertheless encode similar amino acid sequences, due to the degeneracy of the genetic code. It is understood that changes in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid sequence that all encode substantially the same protein.
Floral Specific Promoter: As used herein, the term xe2x80x9cfloral specific promoterxe2x80x9d refers to a regulatory sequence which confers gene expression only in, or predominantly in, floral tissues. The complete sequences of four floral specific promoters are disclosed herein: the promoter of PTD, located within the 5xe2x80x2 regulatory region comprising nucleotides 1-1872 of Seq. I.D. No. 1; the promoter of PTFL, located within the 5xe2x80x2 regulatory region comprising nucleotides 1-2638 of Seq. I.D. No. 5; the promoter of PTAG-1, located within the 5xe2x80x2 regulatory region comprising 1-2410 of Seq. I.D. No. 9; and the promoter of PTAG-2, located within the 5xe2x80x2 regulatory region comprising nucleotides 1-2336 of Seq. I.D. No. 13). Accordingly, these promoter sequences may be used to produce transgene constructs that are specifically or predominantly expressed in floral tissues. One of skill in the art will recognize that effective floral-specific expression may be achieved with less than the entire promoter sequences noted above. Thus, by way of example, floral-specific expression may be obtained by employing sequences comprising 500 nucleotides or fewer (e.g., 250, 200, 150, or 100 nucleotides) upstream of the start codon, AUG, of the disclosed gene sequences.
The determination of whether a particular sub-region of the disclosed sequences operates to confer floral specific expression in a particular system (taking into account the plant species into which the construct is being introduced, the level of expression required, etc.), is preformed using known methods, such as operably linking the promoter sub-region to a marker gene (e.g. GUS), introducing such constructs into plants and then determining the level of expression of the marker gene in floral and other plant tissues. Sub-regions which confer only or predominantly floral expression, are considered to contain the necessary elements to confer floral specific expression.
II. Methods
The four floral homeotic genes were obtained, and the present invention can be practiced, using standard molecular biology and plant transformation procedures, unless otherwise noted. Standard molecular biology procedures are described in Sambrook et al (1989), Ausubel et al. (1987) and innis et al. (1990).
III. Isolation and Characterization of PTLF
Genomic DNA was purified from dormant vegetative buds of a single Populus trichocarpa tree using a modified CTAB extraction technique (Wagner et al., 1987). After centrifugation to pellet nuclei, a large gummy pellet of resin was evident. This was left intact during the resuspension of nuclei, and then discarded. Normal yield of DNA was approximately 1 mg per 40 g of tissue. A genomic library was constructed from DNA partially digested with Sau3A, filled in with DNA Pol I and DATP and dGTP, and ligated into LambdaGem-12 vector (Stratagene) having partially filled-in Xho I sites. Packaging of the DNA into phage particles was performed with GigaPack Gold II (Stratagene).
RNA was extracted using the lithium dodecyl sulfate method of Baker et al. (1990), and purified by centrifugation through a 5.7 M CsCl pad. After redissolving the RNA pellet in TE, pH 8.0, NaCl was added to 400 mM and the RNA was precipitated with EtOH to remove excess CsCl. PolyA+ RNA was selected using oligo dT-cellulose columns (mRNA Separation Kit, Clontech). RNA was stored at xe2x88x9280xc2x0 C. until use. Ten-microgram samples of total RNA were used as templates for single-stranded cDNA synthesis. Reactions included 50 mM TrisHCl (pH 8.3), 75 mM KCl, 10 mM dithiothreitol, 3 mM MgCl2, 100 xcexcM each dNTP, 4 xcexcg primer XT, 10 xcexcCi [xcex132P]-dCTP, and 200 U M-MLV reverse transcriptase (Gibco BRL) in 50 xcexcL. Incubations were performed at 37xc2x0 C. for 1 hr, then the cDNA was purified with GeneClean (BIO101) silica matrix. Typical yields were 10-40 ng of cDNA, as determined by 32P incorporation. The size ranges of the cDNA samples were characterized by alkaline gel electrophoresis. cDNA products were between 500 to 4000 bases in length, with an average size of 1000 bases. The DNA was diluted to 0.25 ng/xcexcL in 10 mM TrisHCl, 1 mM EDTA (pH 8.0) and stored at xe2x88x9220xc2x0 C.
cDNA libraries were prepared using the Lambda-ZAP cDNA cloning kit (Stratagene). From 5 xcexcg of polyA+ RNA, approximately 106 clones were recovered per preparation, with an average size of 1 kb and a size range of 500 bp to 3 kb. A hybridization probe for the Populus FLO/LFY homolog was obtained by touchdown PCR (Don et al., 1991) of the cDNA library with a degenerate primer specific to a highly conserved region of the FLO and LFY genes and a primer specific for the vector plus 3xe2x80x2-end of polyadenylated cDNAs. The PCR protocol was as follows: (94xc2x0 C., 30 sec; 60xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x972, (94xc2x0 C., 30 sec; 58xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x972, (94xc2x0 C., 30 sec; 56xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x972, (94xc2x0 C., 30 sec; 54xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x972, (94xc2x0 C., 30 sec; 52xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x972, (94xc2x0 C., 30 sec; 50xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x978, (94xc2x0 C., 30 sec; 52xc2x0 C., 30 sec; 71xc2x0 C., 1 min)xc3x9725. The approximately 480 bp fragment obtained was gel-purified and subcloned into pBluescript SK(xe2x88x92) for further characterization.
The PTLF genomic clone was isolated by screening the genomic library using probes derived from the PTLF cDNA sequence. Sequencing of the cDNA was performed using the dideoxy-terminator-based Sequenase 2.0 kit (Unites States Biochemical Corp.), according to the methods described by the manufacturer. Most sequencing of the cDNA and subclones of the gene was done using universal primers on nested deletions created with ExoIII (Henikoff, 1984). Gaps were filled in by sequencing from specific primers synthesized at Oregon State University. Sequence analysis was performed using PCGENE (Intelligenetics).
A total of 5,656 bp of the PTLF gene locus was sequenced, including 2,638 bp upstream of the initiation codon and 457 bp downstream of the polyA addition site. This sequence is available on GenBank (http://www.ncbi.nlm.nih.gov/Entrez/nucleotide.html) under accession number U93196 and is shown in Seq. I.D. No. 5. The positions of the two introns found in both FLO and LFY are conserved in PTLF. The longest cDNA obtained (Seq. I.D. No. 6) includes an open reading frame (Seq. I.D. No. 7) that encodes for a predicted polypeptide of 377 amino acid residues (Seq. I.D. No. 8). Comparison of the deduced PTLF amino acid sequence with several FLO/LFY homologs revealed conserved amino- and carboxyl-terminal domains (133 and 175 residues, respectively, in PTLF) linked by a poorly conserved, highly charged domain (69 residues). The overall sequence identity between PTLF and FLO (Coen et al., 1990) is 79%, with 88% amino acid sequence similarity.
Due to the limited seasonal availability of inflorescence and flower tissue, and the difficulty of obtaining large amounts of developing meristems, the levels of PTLF expression were compared using RT-PCR. PTLF was detected most strongly in developing inflorescences, with no significant differences between samples from male and female trees.
For in situ hybridization analysis, tissue samples from various sources were fixed, embedded, sectioned, and hybridized as described by Kelly et al. (1995), with the following modifications. Sections were 10 xcexcm in thickness. Probes were generated from a plasmid consisting of the PTLF cDNA inserted between the EcoRI and Kpn I sites of the vector pBluescriptII SK (xe2x88x92), and were not alkaline hydrolyzed. A PTLF antisense probe hybridized strongly to the floral meristems and developing flowers of both male and female plants. PTLF was not detected in the apical inflorescence meristem, but was seen in the flanking nascent floral meristems. Developing flowers showed expression in the immature carpels and anthers. Both male and female flowers exhibited some hybridization on the inner (adaxial) rim of the perianth cup during the middle stages of development. PTLF also showed marked hybridization to bracts. Hybridization was observed with vegetative buds from mature branches. The pattern of hybridization showed that there was RNA in the axils of the newly formed leaves, but not in the center of the vegetative meristem. There was also significant expression in the tips of the leaf primordia, and in some portions of the surrounding developing leaves.
Overexpression and antisense constructs of PTLF cDNA were produced for analysis in transgenic trees. The insert from the cDNA clone of PTLF was cut out using EcoR I and Kpn I, and the ends were polished with T4 DNA polymerase. The insert was then ligated into the Sma I site of pBI121 (Jefferson et al., 1987). Clones with each orientation were identified by PCR, and the structures of the junction sites near the promoters of both were verified by sequencing of the PCR fragments. Hybrid aspens were used for transformation, in part because of the relative ease of transformation, and in part because of concern that transgenic cottonwoods might interact with native cottonwoods in the vicinity of the experimental site. The P. tremulaxc3x97alba hybrid aspen female clone 717-1B4 and the P. tremulaxc3x97tremuloides hybrid aspen male clone 353-38 were transformed with pDW151 (Weigel and Nilsson, 1995) and the above binary vectors using Agrobacterium tumefasciens strain C58 (Leple et al., 1992) with modifications as described by Han et al. (1996).
Although overexpression of LFY in aspens was reported to result in short, bushy plants that flower within a year (Weigel and Nilsson, 1995), no such obvious phenotypes were seen with PTLF. During more than one year of growth in soil in a greenhouse, and an additional year at a field site in Corvallis, Oreg., few differences were noted for any of the transgenics relative to control plants.
IV. Isolation and Characterization of PTD
The PTD cDNA and gene were isolated by probing the Populus cDNA library described above at low stringency using an Eco RI fragment of pCIT2241 (Ma et al., 1991) which contains the MADS box region of AGL1. The PTD cDNA (Seq. I.D. No. 2) comprises an open reading frame (Seq. I.D. No. 3) encoding a 227 amino acid polypeptide (Seq. I.D. No. 4). The PTD gene (Seq. I.D. No. 1) consists of seven exons.
The PTD polypeptide is 81% conserved overall with respect to DEF. PTD has MADS and K domains. The MADS domain extends over amino acids 1-57, while the K-domain extends over amino acids 87-154. The MADS domain is 93% conserved with respect to DEF, whereas the K domain is 85% conserved at the amino acid level.
To determine if the promoter of PTD would confer the floral-specific expression, 1.9 kb of its promotor and 5xe2x80x2 untranslated region were fused to a GUS-intron reporter gene, and introduced into Arabidopsis, tobacco and poplar. GUS expression was observed in floral tissues including petals and stamens. This expression pattern is characteristic of a xe2x80x9cB functionxe2x80x9d gene like APETALA3, suggesting that PTD has retained the regulatory motifs (i.e. sequence patterns) that direct it to stamens and petals (though poplar has no true petals). No vegetative GUS expression was observed, except in poplar, where vegetative expression was confined to leaf-like structures subtending induced floral structures.
V. Isolation and Characterization of PTAG-1 and PTAG2
Two cDNAs and their corresponding genes were isolated from Populus using the methodologies described above and a probe derived from the 3xe2x80x2 region of the AG cDNA. Denoted PTAG-1 and PTAG-2, these two sequences are the orthologs of AG.
The genomic, cDNA and open reading frame sequences of PTAG-1 are shown in Seq. I.D. Nos. 9, 10 and 11, respectively. The open reading frame encodes a polypeptide of 241 amino acids in length (Seq. I.D. No. 12). The PTAG-1 polypeptide contains both a MADS domain and a K-domain. The MADS domain extends from amino acids 17-72 and the K-domain from amino acids 106-172. The PTAG-1 nucleotide and amino acid sequences are available on GenBank under accession number AF052570.
The genomic, cDNA and open reading frame sequences of PTAG-2 are shown in Seq. I.D. Nos. 13, 14 and 15, respectively. The open reading frame encodes a polypeptide of 238 amino acids in length (Seq. I.D. No. 16). The PTAG-2 polypeptide contains both a MADS domain and a K-domain. The MADS domain extends from amino acids 16-72 and the K-domain from amino acids 106-172. The PTAG-2 nucleotide and amino acid sequences are available on GenBank under accession number AF052571.
Like AG (Yanofsky et al., 1990), both PTAG1 and PTAG2 contain 8 introns at conserved positions. All introns have canonical donor (GT) and acceptor (AG) sites.
At the amino acid level, PTAG-1 and PTAG 2 are 89% identical, and show 72-75% sequence similarity with AG.
Because AG is only expressed in floral tissues and is essential for the development of both male and female reproductive organs, it is ideally suited for use in modifying fertility through genetic engineering approaches. In situ hybridization studies show that the PTAG genes in Populus are expressed in the central zone of both male and female floral meristems, and, as with AG, expression begins before reproductive primordia emerge and continues in developing stamens and carpels. Northern analysis of PTAG gene expression in populus revealed that transcripts are present in immature and mature flowers from both male and female trees. In addition, low levels of PTAG gene expression are present in all vegetative tissues tested. Interestingly, the size of the transcripts from the vegetative tissues are shorter (xcx9c150-200 bp) than the floral transcripts. This size difference is not due to alternate intron/exon splicing.