The present invention is directed to plant genetic engineering. In particular, it relates to new embryo-specific genes useful in improving agronomically important plants.
Embryogenesis in higher plants is a critical stage of the plant life cycle in which the primary organs are established. Embryo development can be separated into two main phases: the early phase in which the primary body organization of the embryo is laid down and the late phase which involves maturation, desiccation and dormancy. In the early phase, the symmetry of the embryo changes from radial to bilateral, giving rise to a hypocotyl with a shoot meristem surrounded by the two cotyledonary primordia at the apical pole and a root meristem at the basal pole. In the late phase, during maturation the embryo achieves its maximum size and the seed accumulates storage proteins and lipids. Maturation is ended by the desiccation stage in which the seed water content decreases rapidly and the embryo passes into metabolic quiescent state. Dormancy ends with seed germination, and development continues from the shoot and the root meristem regions.
The precise regulatory mechanisms which control cell and organ differentiation during the initial phase of embryogenesis are largely unknown. The plant hormone abscisic acid (ABA) is thought to play a role during late embryogenesis, mainly in the maturation stage by inhibiting germination during embryogenesis (Black, M. (1991). In Abscisic Acid: Physiology and Biochemistry, W. J. Davies and H. G. Jones, eds. (Oxford: Bios Scientific Publishers Ltd.), pp. 99-124) Koornneef, M., and Karssen, C. M. (1994). In Arabidopsis, E. M. Meyerowitz and C. R. Sommerville, eds. (Cold Spring Harbor: Cold Spring Harbor Laboratory Press), pp. 313-334). Mutations which effect seed development and are ABA insensitive have been identified in Arabidopsis and maize. The ABA insensitive (abi3) mutant of Arabidopsis and the viviparous1 (vp1) mutant of maize are detected mainly during late embryogenesis (McCarty, et al., (1989) Plant Cell 1, 523-532 and Parcy et al., (1994) Plant Cell 6, 1567-1582). Both the VP1 gene and the ABI3 genes have been isolated and were found to share conserved regions (Giraudat, J. (1995) Current Opinion in Cell Biology 7:232-238 and McCarty, D. R. (1995). Annu. Rev. Plant Physiol. Plant Mol. Biol. 46:71-93). The VP1 gene has been shown to function as a transcription activator (McCarty, et al., (1991) Cell 66:895-906). It has been suggested that ABI3 has a similar function.
Another class of embryo defective mutants involves three genes: LEAFY COTYLEDON1 and 2 (LEC1, LEC2) and FUSCA3 (FUS3). These genes are thought to play a central role in late embryogenesis (Baumlein, et al. (1994) Plant J. 6:379-387; Meinke, D. W. (1992) Science 258:1647-1650; Meinke et al., Plant Cell 6:1049-1064; West et al., (1994) Plant Cell 6:1731-1745). Like the abi3 mutant, leafy cotyledon-type mutants are defective in late embryogenesis. In these mutants, seed morphology is altered, the shoot meristem is activated early, storage proteins are lacking and developing cotyledons accumulate anthocyanin. As with abi3 mutants, they are desiccation intolerant and therefore die during late embryogenesis. Nevertheless, the immature mutants embryos can be rescued to give rise to mature and fertile plants. However, unlike abi3 when the immature mutants germinate they exhibit trichomes on the adaxial surface of the cotyledon. Trichomes are normally present only on leaves, stems and sepals, not cotyledons. Therefore, it is thought that the leafy cotyledon type genes have a role in specifying cotyledon identity during embryo development.
Among the above mutants, the lec1 mutant exhibits the most extreme phenotype during embryogenesis. For example, the maturation and postgermination programs are active simultaneously in the lec1 mutant (West et al., 1994), suggesting a critical role for LEC1 in gene regulation during late embryogenesis.
In spite of the recent progress in defining the genetic control of embryo development, further progress is required in the identification and analysis of genes expressed specifically in the embryo and seed. Characterization of such genes would allow for the genetic engineering plants with a variety of desirable traits. For instance, modulation of the expression of genes which control embryo development may be used to alter traits such as accumulation of storage proteins in leaves and cotyledons. Alternatively, promoters from embryo or seed-specific genes can be used to direct expression of desirable heterologous genes to the embryo or seed. The present invention addresses these and other needs.
The present invention is based, in part, on the isolation and characterization of LEC1 genes. The invention provides isolated nucleic acid molecules comprising a LEC1 polynucleotide sequence which is at least 68% identical to the B domain of SEQ ID NO:2.
The invention also provides expression cassettes comprising a promoter operably linked to a heterologous polynucleotide sequence or complement thereof, encoding a LEC1 polypeptide comprising a sequence which is at least 68% identical to the B domain of SEQ ID NO:2. In some embodiments, the polynucleotide sequence is heterologous to any element in the expression cassette. In a preferred embodiment, the B domain comprises a polypeptide between about amino acid residue 28 and amino acid residue 117 of SEQ ID NO:2. In a more preferred embodiment, the B domain comprises a polypeptide sequence with an amino terminus at amino acid residues 28-35 and a carboxy terminus at amino acid residues 103-117 of SEQ ID NO:2.
In particularly preferred embodiments, the LEC1 polypeptide is shown in SEQ ID NO:20 or 22. Such LEC1 polypeptides can be encoded by the polynucleotide sequences shown in SEQ ID NO:19 or SEQ ID NO:21, respectively. In another embodiment the LEC1 polypeptide is a fusion between two or more LEC1 polypeptides of polypeptide subsequences.
The expression cassette comprises a promoter operably linked to the LEC1 polynucleotide or its complement. For example, the promoter can be a constitutive promoter. Alternatively, the promoter can be a promoter from a LEC1 gene. For instance, the LEC1 promoter can be from about nucleotide 1 to about nucleotide 1998 of SEQ ID NO:3. In one embodiment, the heterologous polynucleotide can be linked to the promoter in the antisense orientation. In another embodiment, the promoter is SEQ ID NO:23. The promoter can further comprise SEQ ID NO:24.
In another embodiment, the invention provides an expression cassette comprising a promoter operably linked to a heterologous polynucleotide sequence, or complement thereof, encoding a LEC1 polypeptide comprising a subsequence at least 90% identical to the A or C domain of a LEC1 polypeptide. The polynucleotide sequence can be heterologous to any element in the expression cassette. Such expression cassettes can encode fusions of two or more LEC1 polypeptides or polypeptide subsequences.
The invention also provides for an expression cassette for the expression of heterologous polypeptides in a plant. The expression cassette comprises a LEC1 promoter operably linked to a heterologous polynucleotide. In some embodiments, the LEC1 promoter is at least 70% identical to SEQ ID NO:23. In some embodiments, the expression cassette promoter comprises a promoter at least 70% identical to SEQ ID NO:24. Preferably, the promoter comprises the sequence displayed in SEQ ID NO:24.
The invention also provides an isolated nucleic acid or complement thereof, encoding a LEC1 polypeptide comprising a subsequence at least 68% identical to the B domain of SEQ ID NO:2, with the proviso that the nucleic acid is not clone MNJ7. In a preferred embodiment, the B domain comprises a polypeptide sequence with an amino terminus at amino acids 28-35 and a carboxy terminus at amino acids 103-117 of SEQ ID NO:2. In another embodiment, the LEC1 polypeptide is shown in SEQ ID NO:20 or SEQ ID NO:22. Such LEC1 polypeptides can be encoded by the polynucleotide sequences shown, in SEQ ID NO:19 or SEQ ID NO:21, respectively. In another embodiment, the LEC1 polypeptide is a fusion between two or more LEC1 polypeptides of polypeptide subsequences.
The isolated nucleic acid can further compromise a promoter operably linked to the LEC1-encoding nucleic acid. The promoter can be a constitutive promoter. Alternatively, the promoter can be a promoter from a LEC1 gene. For insance, the LEC1 promoter can be from about nucleotide 1 to about nucleotide 1998 of SEQ ID NO:3. In one embodiment, the heterologous polynucleotide can be linked to the promoter in the antisense orientation.
The invention provides a host cell comprising expression cassettes or nucleic acids of the invention. Thus, in one embodiment, the host cells of the invention comprise an expression cassette comprising a promoter operably linked to a heterologous a polynucleotide sequence, or complement thereof, encoding a LEC1 polypeptide with a subsequence at least 68% identical to the B domain of SEQ ID NO:2. In other embodiments, the host cell of the invention comprises an expression cassette comprising a promoter operably linked to a heterologous polynucleotide sequence or complement thereof, encoding a LEC1 polypeptide with a subsequence at least 90% identical to the A or C domain of a LEC1 polypeptide. Other embodiments include hosts cells comprising an expression cassette comprising a promoter at least 70% identical to SEQ ID NO:23 or an isolated nucleic acid comprising a subsequence at least 68% identical to the B domain of SEQ ID NO:2, so long as the nucleic acid is not clone MNJ7.
The invention also provides isolated polypeptides comprising amino acid sequences at least 68% identical to the B domain of SEQ ID NO:2 and capable of exhibiting at least one of the biological activities of the polypeptides encoded in SEQ ID NO:1, SEQ ID NO:19 or SEQ ID NO:21, or a fragment thereof. Antibodies capable of binding the above described polypeptide are also provided.
Also provided are methods of introducing an isolated nucleic acid into a host cell. The method comprises providing an expression cassette of nucleic acid of the invention as described herein and contacting the expression cassette or nucleic acid with the host cell under conditions that permit insertion of the nucleic acid into the host cell.
The invention also provides transgenic plant cells or plants comprising an expression cassette comprising a promoter operably linked to a heterologous polynucleotide sequence, or complement thereof, encoding a LEC1 polypeptide comprising a sequence which is at least 68% identical to the B domain of SEQ ID NO:2. In a preferred embodiment, the LEC1 polypeptide is shown in SEQ ID NO:20 or SEQ ID NO:22. Such LEC1 polypeptides can be encoded by the polynucleotide sequences shown in SEQ ID NO:19 or SEQ ID NO:21, respectively. The invention also provides plants that are regenerated from the plant cells discussed above.
The expression cassette promoter can be a constitutive promoter. Alternatively, the promoter can be a promoter from a LEC1 gene. For instance, the LEC1 promoter can be from about nucleotide 1 to about nucleotide 1998 of SEQ ID NO:3. In one embodiment, the heterologous polynucleotide can be linked to the promoter in the antisense orientation. In another embodiment, the promoter is SEQ ID NO:23. The promoter can also further comprise SEQ ID NO:24.
The invention also provides an expression cassette for the expression of a heterologous polynucleotide in a plant cell, comprising a promoter polynucleotide at least 70% identical to SEQ ID NO:23, wherein the promoter polynucleotide is operably linked to a heterologous polynucleotide. In one embodiment, the promoter polynucleotide is SEQ ID NO:23. The promoter can also further comprise a polynucleotide at least 70% identical to SEQ ID NO:24. In a preferred embodiment, the promoter comprises SEQ ID NO:24.
The invention also provides methods of modulating transcription comprising, introducing into the plant an expression cassette containing a plant promoter operably linked to a heterologous LEC1 polynucleotide, the heterologous LEC1 polynucleotide encoding a LEC1 polypeptide comprising a subsequence at least 68% identical to the B domain of SEQ ID NO:2 and detecting a plant with modulated transcription. Embodiments of these methods include where the LEC1 polynucleotide is SEQ ID NO:2, SEQ ID NO:20 or SEQ ID NO:22. In other embodiments, the LEC1 polypeptides are encoded by SEQ ID NO:1, SEQ ID NO:19 or SEQ ID NO:21. Preferred embodiments of the invention include the method where transcription modulation results in induction of embyonic characteristics in a plant. In an alternative embodiment, transcription modulation results in induction of seed development.
The invention also provides a method of detecting a nucleic acid in a sample. The method comprises providing an isolated LEC1 nucleic acid molecule comprising a polynucleotide sequence, or complement thereof, encoding a LEC1 polypeptide with a subsequence at least 68% identical to the B domain of SEQ ID NO:2, contacting the isolated nucleic acid molecule with a sample under conditions which permit a comparison of the sequence of the isolated nucleic acid molecule with the sequence of DNA in the sample; and analyzing the result of the comparison. In some embodiments, the isolated nucleic acid molecule and the sample are contacted under conditions which permit the formation of a duplex between complementary nucleic acid sequences.
The phrase xe2x80x9cnucleic acidxe2x80x9d refers to a single or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases read from the 5xe2x80x2 to the 3xe2x80x2 end. Nucleic acids may also include modified nucleotides that permit correct read through by a polymerase and do not alter expression of a polypeptide encoded by that nucleic acid.
The phrase xe2x80x9cpolynucleotide sequencexe2x80x9d or xe2x80x9cnucleic acid sequencexe2x80x9d includes both the sense and antisense strands of a nucleic acid as either individual single strands or in the duplex. It includes, but is not limited to, self-replicating plasmids, chromosomal sequences, and infectious polymers of DNA or RNA.
The phrase xe2x80x9cnucleic acid sequence encodingxe2x80x9d refers to a nucleic acid which directs the expression of a specific protein or peptide. The nucleic acid sequences include both the DNA strand sequence that is transcribed into RNA and the RNA sequence that is translated into protein. The nucleic acid sequences include both the full length nucleic acid sequences as well as non-full length sequences derived from the full length sequences. It should be further understood that the sequence includes the degenerate codons of the native sequence or sequences which may be introduced to provide codon preference in a specific host cell.
The term xe2x80x9cpromoterxe2x80x9d refers to a region or sequence determinants located upstream or downstream from the start of transcription and which are involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A xe2x80x9cplant promoterxe2x80x9d is a promoter capable of initiating transcription in plant cells. Such promoters need not be of plant origin, for example, promoters derived from plant viruses, such as the CaMV35S promoter, can be used in the present invention.
The term xe2x80x9cplantxe2x80x9d includes whole plants, shoot vegetative organs/structures (e.g. leaves, stems and tubers), roots, flowers and floral organs/structures (e.g. bracts, sepals, petals, stamens, carpels, anthers and ovules), seed (including embryo, endosperm, and seed coat) and fruit (the mature ovary), plant tissue (e.g. vascular tissue, ground tissue, and the like) and cells (e.g guard cells, egg cells, trichomes and the like), and progeny of same. The class of plants that can be used in the method of the invention is generally as broad as the class of higher and lower plants amenable to transformation techniques, including angiosperms (monocotyledonous and dicotyledonous plants), gymnosperms, ferns, and multicellular algae. It includes plants of a variety of ploidy levels, including aneuploid, polyploid, diploid, haploid and hemizygous.
A polynucleotide sequence is xe2x80x9cheterologous toxe2x80x9d an organism or a second polynucleotide sequence if it originates from a foreign species, or, if from the same species, is modified from its original form. For example, a promoter operably linked to a heterologous coding sequence refers to a coding sequence from a species different from that from which the promoter was derived, or, if from the same species, a coding sequence which is different from any naturally occurring allelic variants. As defined here, a modified LEC1 coding sequence which is heterologous to an operably linked LEC1 promoter does not include the T-DNA insertional mutants as described in West et al., The Plant Cell 6:1731-1745 (1994).
A polynucleotide xe2x80x9cexogenous toxe2x80x9d an individual plant is a polynucleotide which is introduced into the plant by any means other than by a sexual cross. Examples of means by which this can be accomplished are described below, and include Agrobacterium-mediated transformation, biolistic methods, electroporation, in planta techniques, and the like. Such a plant containing the exogenous nucleic acid is referred to here as an R1 generation transgenic plant. Transgenic plants which arise from sexual cross or by selfing are descendants of such a plant.
As used herein an xe2x80x9cembryo-specific genexe2x80x9d or xe2x80x9cseed specific genexe2x80x9d is a gene that is preferentially expressed during embryo development in a plant. For purposes of this disclosure, embryo development begins with the first cell divisions in the zygote and continues through the late phase of embryo development (characterized by maturation, desiccation, dormancy), and ends with the production of a mature and desiccated seed. Embryo-specific genes can be further classified as xe2x80x9cearly phase-specificxe2x80x9d and xe2x80x9clate phase-specificxe2x80x9d. Early phase-specific genes are those expressed in embryos up to the end of embryo morphogenesis. Late phase-specific genes are those expressed from maturation through to production of a mature and desiccated seed.
A xe2x80x9cLEC1 polynucleotidexe2x80x9d is a nucleic acid sequence comprising (or consisting of) a coding region of about 100 to about 900 nucleotides, sometimes from about 300 to about 630 nucleotides, which hybridizes to SEQ ID NO:1 under stringent conditions (as defined below), or which encodes a LEC1 polypeptide. LEC1 polynucleotides can also be identified by their ability to hybridize under low stringency conditions (e.g., Tm xcx9c40xc2x0 C.) to nucleic acid probes having a sequence from position 1 to 81 in SEQ ID NO:1 or from position 355 to 627 in SEQ ID NO:1.
A xe2x80x9cpromoter from a LEC1 genexe2x80x9d or xe2x80x9cLEC1 promoterxe2x80x9d will typically be about 500 to about 2000 nucleotides in length, usually from about 750 to 1500. Exemplary promoter sequences are shown as nucleotides 1-1998 of SEQ ID NO:3 or as SEQ ID NO:23. A LEC1 promoter can also be identified by its ability to direct expression in all, or essentially all, proglobular embryonic cells, as well as cotyledons and axes of a late embryo.
A xe2x80x9cLEC1 polypeptidexe2x80x9d is a sequence of about 50 to about 210, sometimes 100 to 150, amino acid residues encoded by a LEC1 polynucleotide. A full length LEC1 polypeptide and fragments containing a CCAAT binding factor (CBF) domain can act as a subunit of a protein capable of acting as a transcription factor in plant cells. LEC1 polypeptides are often distinguished by the presence of a sequence which is required for binding the nucleotide sequence: CCAAT. In particular, a short region of seven residues (MPIANVI; SEQ ID NO:5) at residues 34-40 of SEQ ID NO:2 shows a high degree of similarity to a region that has been shown to required for binding the CCAAT box. Similarly, residues 61-72 of SEQ ID NO:2 (IQECVSEYISFV; SEQ ID NO:6) is nearly identical to a region that contains a subunit interaction domain (Xing, et al., (1993) EMBO J. 12:4647-4655).
As used herein, a homolog of a particular embryo-specific gene (e.g., SEQ ID NO:1) is a second gene in the same plant type or in a different plant type, which has a polynucleotide sequence of at least 50 contiguous nucleotides which are substantially identical (determined as described below) to a sequence in the first gene. It is believed that, in general, homologs share a common evolutionary past.
xe2x80x9cIncreased or enhanced LEC1 activity or expression of the LEC1 genexe2x80x9d refers to an augmented change in LEC1 activity. Examples of such increased activity or expression include the following. LEC1 activity or expression of the LEC1 gene is increased above the level of that in wild-type, non-transgenic control plants (i.e. the quantity of LEC1 activity or expression of the LEC1 gene is increased). LEC1 activity or expression of the LEC1 gene is in an organ, tissue or cell where it is not normally detected in wild-type, non-transgenic control plants (i.e. spatial distribution of LEC1 activity or expression of the LEC1 gene is increased). LEC1 activity or expression is increased when LEC1 activity or expression of the LEC1 gene is present in an organ, tissue or cell for a longer period than in a wild-type, non-transgenic controls (i.e. duration of LEC1 activity or expression of the LEC1 gene is increased).
A xe2x80x9cpolynucleotide sequence fromxe2x80x9d a particular embryo-specific gene is a subsequence or full length polynucleotide sequence of an embryo-specific gene which, when present in a transgenic plant, has the desired effect, for example, inhibiting expression of the endogenous gene driving expression of an heterologous polynucleotide. A full length sequence of a particular gene disclosed here may contain about 95%, usually at least about 98% of an entire sequence shown in the Sequence Listing, below.
The term xe2x80x9creproductive tissuesxe2x80x9d as used herein includes fruit, ovules, seeds, pollen, pistols, flowers, or any embryonic tissue.
In the case of both expression of transgenes and inhibition of endogenous genes (e.g., by antisense, or sense suppression) one of skill will recognize that the inserted polynucleotide sequence need not be identical and may be xe2x80x9csubstantially identicalxe2x80x9d to a sequence of the gene from which it was derived. As explained below, these variants are specifically covered by this term.
In the case where the inserted polynucleotide sequence is transcribed and translated to produce a functional polypeptide, one of skill will recognize that because of codon degeneracy a number of polynucleotide sequences will encode the same polypeptide. These variants are specifically covered by the term xe2x80x9cpolynucleotide sequence fromxe2x80x9d a particular embryo-specific gene, such as LEC1. In addition, the term specifically includes sequences (e.g., full length sequences) substantially identical (determined as described below) with a LEC1 gene sequence and that encode proteins that retain the function of a LEC1 polypeptide.
In the case of polynucleotides used to inhibit expression of an endogenous gene, the introduced sequence need not be perfectly identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence will typically be at least substantially identical (as determined below) to the target endogenous sequence.
Two nucleic acid sequences or polypeptides are said to be xe2x80x9cidenticalxe2x80x9d if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below. The term xe2x80x9ccomplementary toxe2x80x9d is used herein to mean that the sequence is complementary to all or a portion of a reference polynucleotide sequence.
Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Add. APL. Math. 2:482 (1981), by the homology alignment algorithm of Needle man and Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), or by inspection.
xe2x80x9cPercentage of sequence identityxe2x80x9d is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
The term xe2x80x9csubstantial identityxe2x80x9d of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 25% sequence identity. Alternatively, percent identity can be any integer from 25% to 100%. More preferred embodiments include at least: 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99%. compared to a reference sequence using the programs described herein; preferably BLAST using standard parameters, as described below. Accordingly, LEC1 sequences of the invention include nucleic acid sequences that have substantial identity to SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:19 and SEQ ID NO:21. LEC1 sequences of the invention include polypeptide sequences having substantial identify to SEQ ID NO:2, SEQ ID NO:20 or SEQ ID NO:22. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 40%. Preferred percent identity of polypeptides can be any integer from 40% to 100%. More preferred embodiments include at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, or 99%. Most preferred embodiments include 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74% and 75%. Polypeptides which are xe2x80x9csubstantially similarxe2x80x9d share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes. Conservative amino acid substitutions refer to the interchangeability of residues having similar side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and methionine. Preferred conservative amino acids substitution groups are: valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine, alanine-valine, aspartic acid-glutamic acid, and asparagine-glutamine.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other, or a third nucleic acid, under stringent conditions. Stringent conditions are sequence dependent and will be different in different circumstances. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Typically, stringent conditions will be those in which the salt concentration is about 0.02 molar at pH 7 and the temperature is at least about 60xc2x0 C.
In the present invention, mRNA encoded by embryo-specific genes of the invention can be identified in Northern blots under stringent conditions using cDNAs of the invention or fragments of at least about 100 nucleotides. For the purposes of this disclosure, stringent conditions for such RNA-DNA hybridizations are those which include at least one wash in 0.2xc3x97SSC at 63xc2x0 C. for 20 minutes, or equivalent conditions. Genomic DNA or cDNA comprising genes of the invention can be identified using the same cDNAs (or fragments of at least about 100 nucleotides) under stringent conditions, which for purposes of this disclosure, include at least one wash (usually 2) in 0.2xc3x97SSC at a temperature of at least about 50xc2x0 C., usually about 55xc2x0 C., for 20 minutes, or equivalent conditions.