The present invention relates generally to plant molecular biology. More specifically, it relates to nucleic acids and methods for modulating their expression in plants.
Cellular DNA undergoes double strand breakage during the course of many physiological events as well as in response to a variety of environmental insults (Friedburg, E., Walker, G. and Siede, W., DNA Repair and Mutagenesis, ASM Press, Washington, D.C., 1995; Nickollof, J. and Hoekstra, M., DNA Damage and Repair, Humana Press, Totowa, N.J., 1998). Left un-repaired, such double strand breaks (DSBs) lead to mutations that may prove lethal to the organism. Therefore, these DSBs are repaired promptly via two independent pathways: i) homologous recombination ii) non-homologous end joining (Friedburg, E., Walker, G. and Siede, W., DNA Repair and Mutagenesis, ASM Press, Washington D.C., 1995; Nickollof, J. and Hoekstra, M., DNA Damage and Repair, Humana Press, Totowa, N.J., 1998). The first pathway involves a series of very specific biochemical reactions catalyzed by a complex of cellular proteins (Shinohara and Ogawa, Trends in Biochem. Sci. 237:387-391, 1995). Due to the large number of proteins involved in this complex, it is referred to as a xe2x80x98recombinosomexe2x80x99 (Hays et al., Proc. Natl. Acad. Sci. USA 92:6925-6929, 1995). This pathway is the dominant mode of DSB repair in lower eukaryotes such as yeast (Nickollof, J. and Hoekstra, M., DNA Damage and Repair, Humana Press, Totowa, N.J., 1998).
The non-homologous end-joining pathway is the major route of DSB repair in higher eukaryotes (Friedburg, E., Walker, G. and Siede, W., DNA Repair and Mutagenesis, ASM Press, Washington, D.C., 1995; Nickollof, J. and Hoekstra, M., DNA Damage and Repair, Humana Press, Totowa, N.J., 1998). This pathway is also catalyzed by a group of cellular proteins. This group contains, in addition to hitherto unidentified factors, some well-characterized enzymes such as DNA ligases, Poly(ADP-Ribose) Polymerase [PADPRP], and DNA -dependent Protein Kinase [DNA-PK] (Lindahl et al., Trends in Biochem. Sci. 237: 405-411, 1995; Jackson and Jeggo, Trends in Biochem. Sci. 237:412-415, 1995). These enzymes have been studied in detail using lower as well as higher vertebrate systems including mammals. Both PADPRP and DNA-PK have been shown to be activated by DNA ends. Moreover, these two enzymes also bind DNA ends (Lindahl et al., Trends in Biochem. Sci. 237:405-411, 1995; Jackson and Jeggo, Trends in Biochem. Sci. 237:412-415, 1995). While PADPRP is a single polypeptide of xcx9c115 kDa, DNA-PK exists as a complex of two subunits (Shah et al., Anal. Biochem. 227:1-13, 1995; Dvir et al., Proc. Natl. Acad. Sci. USA 89:11920-11924, 1992; Anderson et al., Crit. Rev. Eukaryot. Gene Express. 4:283-314, 1992). The catalytic subunit [DNA-PKcs] is composed of a single polypeptide of xcx9c450 kDa. It is a serine-threonine type of protein kinase that phosphorylates a variety of nuclear enzymes, transcription factors and oncogenes (Anderson et al., Crit. Rev. Eukaryot. Gene Express. 4:283-314, 1992). However, DNA-PKcs by itself does not bind DNA. The non-catalytic subunit of DNA-PK is a heterodimer composed of 70 kDa and 86 kDa proteins. The non-catalytic subunit acts as a regulator of DNA-PKcs by virtue of its"" ability to bind to DNA ends, thereby recruiting the catalytic subunit to the site of DSBs (Dvir et al., Proc. Natl. Acad. Sci. USA 89:11920-11924, 1992; Anderson et al., Crit. Rev. Eukaryot. Gene Express. 4:283-314, 1992).
Although the enzymology of DNA-PKcs has been investigated extensively, its biological function was identified only recently (Dvir et al., Proc. Natl. Acad. Sci. USA 89:11920-11924, 1992 ;Jeggo, Mutation Res. 384:1-14, 1997). Availability of the full-length cDNA sequence of mammalian DNA-PKcs allowed identification of this protein as a member of the phosphotidyl inositol 3-kinase (PI kinase) gene family. While most members of this family are lipid kinases, a small number of proteins forming a subfamily specifically phosphorylate proteins. Members of this subfamily are known as PI-K related kinases and include the ATM protein, Tel1p, Tor1p, Tor2p, FRAP, Rad3p, Mec1p and Mei41 (Jeggo, Mutation Res. 384: 1-14, 1997). In addition to their structural and biochemical similarities, members of this subfamily also appear to share a common biological function. They are all involved in repair of DNA that is damaged in response to a variety of genetic, physiological or environmental events (Jeggo, Mutation Res. 384:1-14, 1997). Although several members of this subfamily have been cloned from animals, no information on plant DNA-PKcs is available in the literature.
The non-catalytic subunit of DNA-PK consists of two proteins of xcx9c70 kDa and 86 kDa (Dvir et al., Proc. Natl. Acad. Sci. USA 89:11920-11924, 1992; Gotlib and Jackson, Cell 72:131-142, 1993). These two proteins appear to be identical to previously well-characterized mammalian Ku proteins (Dvir et al., Proc. Natl. Acad. Sci. USA 89:11920-11924, 1992). The Ku complex, also a heterodimer of 70 kDa and 86 kDa proteins, was shown to be a nuclear DNA-binding autoantigen (Mimori et al., J. Clin. Invest. 68:611-620, 1981; Mimori et al., J.Biol. Chem. 261:2274-2278, 1986). Patients diagnosed with a variety of autoimmune diseases have been known to develop antibodies to Ku proteins (Yaneva and Arnettt, Clin. Exp. Immunol. 76:366-372, 1989). Further biochemical analysis has established that Ku binds with strong affinity to DNA ends, stem-loop structures, DNA bubbles, or transitions between double stranded DNA and two single strands (Chu, J. Biol. Chem. 272:24097-24100, 1997). Subsequent to binding to the ends, Ku molecules can translocate along the DNA, such that three or more molecules can bind to the linear DNA fragment. Both components of Ku have a DNA dependent ATPase activity and an ATP dependent helicase activity (Chu, J. Biol. Chem. 272: 24097-24100, 1997). Recently, Yoo and Dynan have also demonstrated RNA binding activity of the Ku protein (Yoo and Dynan, Biochemistry 37:1336-1343, 1998).
Recent genetic studies using rodent cell lines defective in DNA strand break repair have provided the important link between Ku protein, DNA-PK and DSB repairs during DNA replication, repair and recombination (Yoo and Dynan, Biochemistry 37:1336-1343, 1998). Boulton and Jackson have shown that the yeast Ku70 potentiates illegitimate DNA DSB repair and serves as a barrier to error-prone DNA repair pathways (Boulton and Jackson, EMBO J. 15:5093-5103, 1996). Studies with mutant rodent cell lines have clearly shown that Ku proteins are required for the V (D) J DNA recombination and immunoglobulin isotype switching (Roth et al., Current Biol. 5: 96-498, 1995; Casellas et al., EMBO J. 17:2404-2411, 1998). Components of DNA-PK are also involved in the non-homologous end-joining pathway in telomeric length maintenance and telomere silencing as well as telomere integrity (Boulton and Jackson, EMBO J. 17:1819-1828, 1998; Polotnianka et al., Current Biol. 8:831-834, 1998). Ramsden and Gellert have recently observed that Ku protein stimulates DNA end joining by mammalian DNA ligases and proposed a direct role for Ku in DSB repair (Ramsden and Gellert, EMBO J. 17:609-614, 1998). A role for Ku protein in modulation of heat shock response and hyperthermic radiosensitization has also been advocated (Yang et al., Mol. Cell. Biol. 16: 3799-3806, 1996; Burgman et al., Cancer Res. 57: 2847-2850, 1997). As discussed above, recent studies have established the role of DNA-PK components in various cellular processes involving DSB. During the course of these investigations, Ku homologues have been cloned from human, mouse, Drosophila melanogaster, Rhipicephalus appendiculatus and Caenorhabditis elegans (Reeves and Sthoeger, J. Biol. Chem. 264:5047-5052, 1989; Chan et al., J. Biol. Chem. 264: 3651-3654, 1989; Porges et al., J. Immunol. 145:222-4228, 1990; Jacoby et al., J. Biol. Chem.269: 11484-11491, 1994; Paesen et al., Biochim Biophys. Acta 1305:120-124, 1996; Boulton and Jackson, Nucleic Acid Res. 24:4639-4648, 1996). Interestingly, Ku homologues have also been reported in Saccharomyces cerevisiae (Feldmann and Winnacker, J. Biol. Chem. 268:12895-12900, 1993; Feldmann et al., J.Biol. Chem. 271:27765-27769, 1996; Boulton and Jackson, Nucleic Acid Res. 24:4639-4648, 1996; Wang et al, J. Biol. Chem. 273:842-848, 1998). However, despite wide speculations and exhaustive research by many individuals, the presence of DNA-PK components in plants has never been documented.
Control of homologous recombination or non-homologous end joining by modulating Ku provides the means to modulate the efficiency which heterologous nucleic acids are incorporated into the genomes of a target plant cell. Control of these processes has important implications in the creation of novel recombinantly engineered crops such as maize. The maize Ku80 orthologue of the present invention provides this and other advantages.
Generally, it is the object of the present invention to provide nucleic acids and proteins relating to the maize Ku80 homolog. It is an object of the present invention to provide: 1) antigenic fragments of the proteins of the present invention; 2) transgenic plants comprising the nucleic acids of the present invention; 3) methods for modulating, in a transgenic plant, the expression of the nucleic acids of the present invention.
Therefore, in one aspect, the present invention relates to an isolated nucleic acid comprising a member selected from the group consisting of (a) a polynucleotide having a specified sequence identity to a polynucleotide encoding a polypeptide of the present invention; (b) a polynucleotide which is complementary to the polynucleotide of (a); and, (c) a polynucleotide comprising a specified number of contiguous nucleotides from a polynucleotide of (a) or (b). The isolated nucleic acid can be DNA.
In another aspect, the present invention relates to recombinant expression cassettes, comprising a nucleic acid of the present invention operably linked to a promoter.
In another aspect, the present invention is directed to a host cell into which has been introduced the recombinant expression cassette.
In a further aspect, the present invention relates to an isolated protein comprising a polypeptide having a specified number of contiguous amino acids encoded by an isolated nucleic acid of the present invention.
In another aspect, the present invention relates to an isolated nucleic acid comprising a polynucleotide of specified length which selectively hybridizes under stringent conditions to a polynucleotide of the present invention, or a complement thereof. In some embodiments, the isolated nucleic acid is operably linked to a promoter.
In another aspect, the present invention relates to a recombinant expression cassette comprising a nucleic acid amplified from a library as referred to supra, wherein the nucleic acid is operably linked to a promoter. In some embodiments, the present invention relates to a host cell transfected with this recombinant expression cassette. In some embodiments, the present invention relates to a protein of the present invention that is produced from this host cell.
In yet another aspect, the present invention relates to a transgenic plant comprising a recombinant expression cassette comprising a plant promoter operably linked to any of the isolated nucleic acids of the present invention. The present invention also provides transgenic seed from the transgenic plant.
Definitions
Units, prefixes, and symbols may be denoted in their SI accepted form. Unless otherwise indicated, nucleic acids are written left to right in 5xe2x80x2 to 3xe2x80x2 orientation; amino acid sequences are written left to right in amino to carboxy orientation, respectively. Numeric ranges are inclusive of the numbers defining the range and include each integer within the defined range. Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes. Unless otherwise provided for, software, electrical, and electronics terms as used herein are as defined in The New IEEE Standard Dictionary of Electrical and Electronics Terms (5th edition, 1993). The terms defined below are more fully defined by reference to the specification as a whole.
By xe2x80x9camplifiedxe2x80x9d is meant the construction of multiple copies of a nucleic acid sequence or multiple copies complementary to the nucleic acid sequence using at least one of the nucleic acid sequences as a template. Amplification systems include the polymerase chain reaction (PCR) system, ligase chain reaction (LCR) system, nucleic acid sequence based amplification (NASBA, Cangene, Mississauga, Ontario), Q-Beta Replicase systems, transcription-based amplification system (TAS), and strand displacement amplification (SDA). See, e.g., Diagnostic Molecular Microbiology: Principles and Applications, D. H. Persing et al., Ed., American Society for Microbiology, Washington, D.C. (1993). The product of amplification is termed an amplicon.
The term xe2x80x9cantibodyxe2x80x9d includes reference to antigen binding forms of antibodies (e.g., Fab, F(ab)2). The term xe2x80x9cantibodyxe2x80x9d frequently refers to a polypeptide substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof which specifically bind and recognize an analyte (antigen). However, while various antibody fragments can be defined in terms of the digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments such as single chain Fv, chimeric antibodies (i.e., comprising constant and variable regions from different species), humanized antibodies (i.e., comprising a complementarity determining region (CDR) from a non-human source) and heteroconjugate antibodies (e.g., bispecific antibodies).
The term xe2x80x9cantigenxe2x80x9d includes reference to a substance to which an antibody can be generated and/or to which the antibody is specifically immunoreactive. The specific immunoreactive sites within the antigen are known as epitopes or antigenic determinants. These epitopes can be a linear array of monomers in a polymeric compositionxe2x80x94such as amino acids in a proteinxe2x80x94or consist of or comprise a more complex secondary or tertiary structure. Those of skill will recognize that all immunogens (i.e., substances capable of eliciting an immune response) are antigens; however some antigens, such as haptens, are not immunogens but may be made immunogenic by coupling to a carrier molecule. An antibody immunologically reactive with a particular antigen can be generated in vivo or by recombinant methods such as selection of libraries of recombinant antibodies in phage or similar vectors. See, e.g., Huse et al., Science 246:1275-1281 (1989); and Ward et al., Nature 341:544-546 (1989); and Vaughan et al., Nature Biotech. 14:309-314 (1996).
As used herein, xe2x80x9cantisense orientationxe2x80x9d includes reference to a duplex polynucleotide sequence that is operably linked to a promoter in an orientation where the antisense strand is transcribed. The antisense strand is sufficiently complementary to an endogenous transcription product such that translation of the endogenous transcription product is often inhibited.
As used herein, xe2x80x9cchromosomal regionxe2x80x9d includes reference to a length of a chromosome that may be measured by reference to the linear segment of DNA that it comprises. The chromosomal region can be defined by reference to two unique DNA sequences, i.e., markers.
The term xe2x80x9cconservatively modified variantsxe2x80x9d applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or conservatively modified variants of the amino acid sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are xe2x80x9csilent variationsxe2x80x9d and represent one species of conservatively modified variation. Every nucleic acid sequence herein that encodes a polypeptide also, by reference to the genetic code, describes every possible silent variation of the nucleic acid. One of ordinary skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine; and UGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide of the present invention is implicit in each described polypeptide sequence and is within the scope of the present invention.
As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a xe2x80x9cconservatively modified variantxe2x80x9d where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Thus, any number of amino acid residues selected from the group of integers consisting of from 1 to 15 can be so altered. Thus, for example, 1, 2, 3, 4, 5, 7, or 10 alterations can be made. Conservatively modified variants typically provide similar biological activity as the unmodified polypeptide sequence from which they are derived. For example, substrate specificity, enzyme activity, or ligand/receptor binding is generally at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the native protein for its native substrate. Conservative substitution tables providing functionally similar amino acids are well known in the art.
The following six groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Serine (S), Threonine (T);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
See also, Creighton (1984) Proteins W. H. Freeman and Company.
By xe2x80x9cencodingxe2x80x9d or xe2x80x9cencodedxe2x80x9d, with respect to a specified nucleic acid, is meant comprising the information for translation into the specified protein. A nucleic acid encoding a protein may comprise non-translated sequences (e.g., introns) within translated regions of the nucleic acid, or may lack such intervening non-translated sequences (e.g., as in cDNA). The information by which a protein is encoded is specified by the use of codons. Typically, the amino acid sequence is encoded by the nucleic acid using the xe2x80x9cuniversalxe2x80x9d genetic code. However, variants of the universal code, such as are present in some plant, animal, and fungal mitochondria, the bacterium Mycoplasma capricolum, or the ciliate Macronucleus, may be used when the nucleic acid is expressed therein.
When the nucleic acid is prepared or altered synthetically, advantage can be taken of known codon preferences of the intended host where the nucleic acid is to be expressed. For example, although nucleic acid sequences of the present invention may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. Nucl. Acids Res. 17:477-498 (1989)). Thus, the maize preferred codon for a particular amino acid may be derived from known gene sequences from maize. Maize codon usage for 28 genes from maize plants are listed in Table 4 of Murray et al., supra.
As used herein xe2x80x9cfull-length sequencexe2x80x9d in reference to a specified polynucleotide or its encoded protein means having the entire amino acid sequence of, a native (non-synthetic), endogenous, biologically active form of the specified protein. Methods to determine whether a sequence is full-length are well known in the art including such exemplary techniques as northern or western blots, primer extension, S1 protection, and ribonuclease protection. See, e.g., Plant Molecular Biology: A Laboratory Manual, Clark, Ed., Springer-Verlag, Berlin (1997). Comparison to known full-length homologous (orthologous and/or paralogous) sequences can also be used to identify full-length sequences of the present invention. Additionally, consensus sequences typically present at the 5xe2x80x2 and 3xe2x80x2 untranslated regions of mRNA aid in the identification of a polynucleotide as full-length. For example, the consensus sequence ANNNNAUGG, where the underlined codon represents the N-terminal methionine, aids in determining whether the polynucleotide has a complete 5xe2x80x2 end. Consensus sequences at the 3xe2x80x2 end, such as polyadenylation sequences, aid in determining whether the polynucleotide has a complete 3xe2x80x2 end.
As used herein, xe2x80x9cheterologousxe2x80x9d in reference to a nucleic acid is a nucleic acid that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterologous structural gene is from a species different from that from which the structural gene was derived, or, if from the same species, one or both are substantially modified from their original form. A heterologous protein may originate from a foreign species or, if from the same species, is substantially modified from its original form by deliberate human intervention.
By xe2x80x9chost cellxe2x80x9d is meant a cell which contains a vector and supports the replication and/or expression of the vector. Host cells may be prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or mammalian cells. Preferably, host cells are monocotyledonous or dicotyledonous plant cells. A particularly preferred monocotyledonous host cell is a maize host cell.
The term xe2x80x9chybridization complexxe2x80x9d includes reference to a duplex nucleic acid structure formed by two single-stranded nucleic acid sequences selectively hybridized with each other.
By xe2x80x9cimmunologically reactive conditionsxe2x80x9d or xe2x80x9cimmunoreactive conditionsxe2x80x9d is meant conditions which allow an antibody, reactive to a particular epitope, to bind to that epitope to a detectably greater degree (e.g., at least 2-fold over background) than the antibody binds to substantially any other epitopes in a reaction mixture comprising the particular epitope. Immunologically reactive conditions are dependent upon the format of the antibody binding reaction and typically are those utilized in immunoassay protocols. See Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York (1988), for a description of immunoassay formats and conditions.
The term xe2x80x9cintroducedxe2x80x9d in the context of inserting a nucleic acid into a cell, means xe2x80x9ctransfectionxe2x80x9d or xe2x80x9ctransformationxe2x80x9d or xe2x80x9ctransductionxe2x80x9d and includes reference to the incorporation of a nucleic acid into a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated into the genome of the cell (e.g., chromosome, plasmid, plastid or mitochondrial DNA), converted into an autonomous replicon, or transiently expressed (e.g., transfected mRNA).
The terms xe2x80x9cisolatedxe2x80x9d refers to material, such as a nucleic acid or a protein, which is: (1) substantially or essentially free from components that normally accompany or interact with it as found in its naturally occurring environment. The isolated material optionally comprises material not found with the material in its natural environment; or (2) if the material is in its natural environment, the material has been synthetically (non-naturally) altered by deliberate human intervention to a composition and/or placed at a location in the cell (e.g., genome or subcellular organelle) not native to a material found in that environment. The alteration to yield the synthetic material can be performed on the material within or removed from its natural state. For example, a naturally occurring nucleic acid becomes an isolated nucleic acid if it is altered, or if it is transcribed from DNA which has been altered, by means of human intervention performed within the cell from which it originates. See, e.g., Compounds and Methods for Site Directed Mutagenesis in Eukaryotic Cells, Kmiec, U.S. Pat. No. 5,565,350; In Vivo Homologous Sequence Targeting in Eukaryotic Cells; Zarling et al., PCT/US93/03868. Likewise, a naturally occurring nucleic acid (e.g., a promoter) becomes isolated if it is introduced by non-naturally occurring means to a locus of the genome not native to that nucleic acid. Nucleic acids which are xe2x80x9cisolatedxe2x80x9d as defined herein, are also referred to as xe2x80x9cheterologousxe2x80x9d nucleic acids.
Unless otherwise stated, the term xe2x80x9cmaize Ku80 nucleic acidxe2x80x9d is a nucleic acid of the present invention and means a nucleic acid comprising a polynucleotide of the present invention (a xe2x80x9cmaize Ku80 polynucleotidexe2x80x9d) encoding a maize Ku80 polypeptide. A xe2x80x9cmaize Ku80 genexe2x80x9d is a gene of the present invention and refers to a heterologous genomic form of a full-length maize Ku80 polynucleotide.
As used herein, xe2x80x9clocalized within the chromosomal region defined by and includingxe2x80x9d with respect to particular markers includes reference to a contiguous length of a chromosome delimited by and including the stated markers.
As used herein, xe2x80x9cmarkerxe2x80x9d includes reference to a locus on a chromosome that serves to identify a unique position on the chromosome. A xe2x80x9cpolymorphic markerxe2x80x9d includes reference to a marker which appears in multiple forms (alleles) such that different forms of the marker, when they are present in a homologous pair, allow transmission of each of the chromosomes of that pair to be followed. A genotype may be defined by use of one or a plurality of markers.
As used herein, xe2x80x9cnucleic acidxe2x80x9d includes reference to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues having the essential nature of natural nucleotides in that they hybridize to single-stranded nucleic acids in a manner similar to naturally occurring nucleotides (e.g., peptide nucleic acids).
By xe2x80x9cnucleic acid libraryxe2x80x9d is meant a collection of isolated DNA or RNA molecules which comprise and substantially represent the entire transcribed fraction of a genome of a specified organism. Construction of exemplary nucleic acid libraries, such as genomic and cDNA libraries, is taught in standard molecular biology references such as Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology, Vol. 152, Academic Press, Inc., San Diego, Calif. (Berger); Sambrook et al., Molecular Cloningxe2x80x94A Laboratory Manual, 2nd ed., Vol. 1-3 (1989); and Current Protocols in Molecular Biology, F. M. Ausubel et al., Eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley and Sons, Inc. (1994).
As used herein xe2x80x9coperably linkedxe2x80x9d includes reference to a functional linkage between a promoter and a second sequence, wherein the promoter sequence initiates and mediates transcription of the DNA sequence corresponding to the second sequence. Generally, operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary to join two protein coding regions, contiguous and in the same reading frame.
As used herein, the term xe2x80x9cplantxe2x80x9d includes reference to whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds and plant cells and progeny of same. Plant cell, as used herein includes, without limitation, seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores. The class of plants which can be used in the methods of the invention is generally as broad as the class of higher plants amenable to transformation techniques, including both monocotyledonous and dicotyledonous plants. A particularly preferred plant is Zea mays. 
As used herein, xe2x80x9cpolynucleotidexe2x80x9d includes reference to a deoxyribopolynucleotide, ribopolynucleotide, or analogs thereof that have the essential nature of a natural ribonucleotide in that they hybridize, under stringent hybridization conditions, to substantially the same nucleotide sequence as naturally occurring nucleotides and/or allow translation into the same amino acid(s) as the naturally occurring nucleotide(s). A polynucleotide can be full-length or a subsequence of a native or heterologous structural or regulatory gene. Unless otherwise indicated, the term includes reference to the specified sequence as well as the complementary sequence thereof. Thus, DNAs or RNAs with backbones modified for stability or for other reasons are xe2x80x9cpolynucleotidesxe2x80x9d as that term is intended herein. Moreover, DNAs or RNAs comprising unusual bases, such as inosine, or modified bases, such as tritylated bases, to name just two examples, are polynucleotides as the term is used herein. It will be appreciated that a great variety of modifications have been made to DNA and RNA that serve many useful purposes known to those of skill in the art. The term polynucleotide as it is employed herein embraces such chemically, enzymatically or metabolically modified forms of polynucleotides, as well as the chemical forms of DNA and RNA characteristic of viruses and cells, including among other things, simple and complex cells.
The terms xe2x80x9cpolypeptidexe2x80x9d, xe2x80x9cpeptidexe2x80x9d and xe2x80x9cproteinxe2x80x9d are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical analogue of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers. The essential nature of such analogues of naturally occurring amino acids is that, when incorporated into a protein, that protein is specifically reactive to antibodies elicited to the same protein but consisting entirely of naturally occurring amino acids. The terms xe2x80x9cpolypeptidexe2x80x9d, xe2x80x9cpeptidexe2x80x9d and xe2x80x9cproteinxe2x80x9d are also inclusive of modifications including, but not limited to, glycosylation, lipid attachment, sulfation, gamma-carboxylation of glutamic acid residues, hydroxylation and ADP-ribosylation. It will be appreciated, as is well known and as noted above, that polypeptides are not always entirely linear. For instance, polypeptides may be branched as a result of ubiquitination, and they may be circular, with or without branching, generally as a result of posttranslation events, including natural processing event and events brought about by human manipulation which do not occur naturally. Circular, branched and branched circular polypeptides may be synthesized by non-translation natural process and by entirely synthetic methods, as well. Further, this invention contemplates the use of both the methionine-containing and the methionine-less amino terminal variants of the protein of the invention.
As used herein xe2x80x9cpromoterxe2x80x9d includes reference to a region of DNA upstream from the start of transcription and involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. A xe2x80x9cplant promoterxe2x80x9d is a promoter capable of initiating transcription in plant cells whether nor not its origin is a plant cell. Exemplary plant promoters include, but are not limited to, those that are obtained from plants, plant viruses, and bacteria which comprise genes expressed in plant cells such Agrobacterium or Rhizobium. Examples of promoters under developmental control include promoters that preferentially initiate transcription in certain tissues, such as leaves, roots, or seeds. Such promoters are referred to as xe2x80x9ctissue preferredxe2x80x9d. Promoters which initiate transcription only in certain tissue are referred to as xe2x80x9ctissue specificxe2x80x9d. A xe2x80x9ccell typexe2x80x9d specific promoter primarily drives expression in certain cell types in one or more organs, for example, vascular cells in roots or leaves. An xe2x80x9cinduciblexe2x80x9d or xe2x80x9crepressiblexe2x80x9d promoter is a promoter which is under environmental control. Examples of environmental conditions that may effect transcription by inducible promoters include anaerobic conditions or the presence of light. Tissue specific, tissue preferred, cell type specific, and inducible promoters constitute the class of xe2x80x9cnon-constitutivexe2x80x9d promoters. A xe2x80x9cconstitutivexe2x80x9d promoter is a promoter which is active under most environmental conditions.
The term xe2x80x9cmaize Ku80 polypeptidexe2x80x9d is a polypeptide of the present invention and refers to one or more amino acid sequences, in glycosylated or non-glycosylated form. The term is also inclusive of fragments, variants, homologs, alleles or precursors (e.g., preproproteins or proproteins) thereof. A xe2x80x9cmaize Ku80 proteinxe2x80x9d is a protein of the present invention and comprises a maize Ku80 polypeptide.
As used herein xe2x80x9crecombinantxe2x80x9d includes reference to a cell or vector, that has been modified by the introduction of a heterologous nucleic acid or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found in identical form within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all as a result of deliberate human intervention. The term xe2x80x9crecombinantxe2x80x9d as used herein does not encompass the alteration of the cell or vector by naturally occurring events (e.g., spontaneous mutation, natural transformation/transduction/transposition) such as those occurring without deliberate human intervention.
As used herein, a xe2x80x9crecombinant expression cassettexe2x80x9d is a nucleic acid construct, generated recombinantly or synthetically, with a series of specified nucleic acid elements which permit transcription of a particular nucleic acid in a host cell. The recombinant expression cassette can be incorporated into a plasmid, chromosome, mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant expression cassette portion of an expression vector includes, among other sequences, a nucleic acid to be transcribed, and a promoter.
The term xe2x80x9cresiduexe2x80x9d or xe2x80x9camino acid residuexe2x80x9d or xe2x80x9camino acidxe2x80x9d are used interchangeably herein to refer to an amino acid that is incorporated into a protein, polypeptide, or peptide (collectively xe2x80x9cproteinxe2x80x9d). The amino acid may be a naturally occurring amino acid and, unless otherwise limited, may encompass non-natural analogs of natural amino acids that can function in a similar manner as naturally occurring amino acids.
The term xe2x80x9cselectively hybridizesxe2x80x9d includes reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g., at least 2-fold over background) than its hybridization to non-target nucleic acid sequences and to the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80% sequence identity, preferably 90% sequence identity, and most preferably 100% sequence identity (i.e., complementary) with each other.
The term xe2x80x9cspecifically reactivexe2x80x9d, includes reference to a binding reaction between an antibody and a protein having an epitope recognized by the antigen binding site of the antibody. This binding reaction is determinative of the presence of a protein having the recognized epitope amongst the presence of a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to an analyte having the recognized epitope to a substantially greater degree (e.g., at least 2-fold over background) than to substantially all analytes lacking the epitope which are present in the sample.
Specific binding to an antibody under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, antibodies raised to the polypeptides of the present invention can be selected from to obtain antibodies specifically reactive with polypeptides of the present invention. The proteins used as immunogens can be in native conformation or denatured so as to provide a linear epitope.
A variety of immunoassay formats may be used to select antibodies specifically reactive with a particular protein (or other analyte). For example, solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane, Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York (1988), for a description of immunoassay formats and conditions that can be used to determine selective reactivity.
The term xe2x80x9cstringent conditionsxe2x80x9d or xe2x80x9cstringent hybridization conditionsxe2x80x9d includes reference to conditions under which a probe will hybridize to its target sequence, to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to the probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, optionally less than 500 nucleotides in length.
Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30xc2x0 C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60xc2x0 C. for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37xc2x0 C., and a wash in 1xc3x97 to 2xc3x97SSC (20xc3x97SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55xc2x0 C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37xc2x0 C., and a wash in 0.5xc3x97 to 1xc3x97SSC at 55 to 60xc2x0 C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl,1% SDS at 37xc2x0 C., and a wash in 0.1xc3x97SSC at 60 to 65xc2x0 C.
Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, Anal. Biochem., 138:267-284 (1984): Tm=81.50xc2x0 C.+16.6 (log M)+0.41 (%GC)xe2x88x920.61 (% form)xe2x88x92500/L; where M is the molarity of monovalent cations, %GC is the percentage of guanosine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1xc2x0 C. for each 1% of mismatching; thus, Tm, hybridization and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with xe2x89xa790% identity are sought, the Tm can be decreased 10xc2x0 C. Generally, stringent conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence and its complement at a defined ionic strength and pH. However, severely stringent conditions can utilize a hybridization and/or wash at 1, 2, 3, or 4xc2x0 C. lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10xc2x0 C. lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or 20xc2x0 C. lower than the thermal melting point (Tm). Using the equation, hybridization and wash compositions, and desired Tm, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a Tm of less than 45xc2x0 C. (aqueous solution) or 32xc2x0 C. (formamide solution) it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biologyxe2x80x94Hybridization with Nucleic Acid Probes, Part I, Chapter 2 xe2x80x9cOverview of principles of hybridization and the strategy of nucleic acid probe assaysxe2x80x9d, Elsevier, New York (1993); and Current Protocols in Molecular Biology, Chapter 2, Ausubel et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).
As used herein, xe2x80x9ctransgenic plantxe2x80x9d includes reference to a plant which comprises within its genome a heterologous polynucleotide. Generally, the heterologous polynucleotide is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The heterologous polynucleotide may be integrated into the genome alone or as part of a recombinant expression cassette. xe2x80x9cTransgenicxe2x80x9d is used herein to include any cell, cell line, callus, tissue, plant part or plant, the genotype of which has been altered by the presence of heterologous nucleic acid including those transgenics initially so altered as well as those created by sexual crosses or asexual propagation from the initial transgenic. The term xe2x80x9ctransgenicxe2x80x9d as used herein does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, non-recombinant viral infection, non-recombinant bacterial transformation, non-recombinant transposition, or spontaneous mutation.
As used herein, xe2x80x9cvectorxe2x80x9d includes reference to a nucleic acid used in transfection of a host cell and into which can be inserted a polynucleotide. Vectors are often replicons. Expression vectors permit transcription of a nucleic acid inserted therein.
The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) xe2x80x9creference sequencexe2x80x9d, (b) xe2x80x9ccomparison windowxe2x80x9d, (c) xe2x80x9csequence identityxe2x80x9d, (d) xe2x80x9cpercentage of sequence identityxe2x80x9d, and (e) xe2x80x9csubstantial identityxe2x80x9d.
(a) As used herein, xe2x80x9creference sequencexe2x80x9d is a defined sequence used as a basis for sequence comparison. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.
(b) As used herein, xe2x80x9ccomparison windowxe2x80x9d includes reference to a contiguous and specified segment of a polynucleotide/polypeptide sequence, wherein the polynucleotide/polypeptide sequence may be compared to a reference sequence and wherein the portion of the polynucleotide/polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides/amino acids residues in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide/polypeptide sequence, a gap penalty is typically introduced and is subtracted from the number of matches.
Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981); by the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970); by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sci. 85:2444 (1988); by computerized implementations of these algorithms, including, but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, Calif.; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis., USA; the CLUSTAL program is well described by Higgins and Sharp, Gene 73:237-244 (1988); Higgins and Sharp, CABIOS 5: 151-153 (1989); Corpet et al., Nucleic Acids Research 16:10881-90 (1988); Huang et al., Computer Applications in the Biosciences 8:155-65 (1992), and Pearson et al., Methods in Molecular Biology 24:307-331 (1994).
The BLAST family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences. See, Current Protocols in Molecular Biology, Chapter 19, Ausubel et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).
Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology Information. This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always  greater than 0) and N (penalty score for mismatching residues; always  less than 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=xe2x88x924, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915).
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin and Altschul, Proc. Nat""l. Acad. Sci. USA 90:5873-5877 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
BLAST searches assume that proteins can be modeled as random sequences. However, many real proteins comprise regions of nonrandom sequences which may be homopolymeric tracts, short-period repeats, or regions enriched in one or more amino acids. Such low-complexity regions may be aligned between unrelated proteins even though other regions of the protein are entirely dissimilar. A number of low-complexity filter programs can be employed to reduce such low-complexity alignments. For example, the SEG (Wooten and Federhen, Comput. Chem. 17:149-163 (1993)) and XNU (Claverie and States, Comput Chem. 17:191-201 (1993)) low-complexity filters can be employed alone or in combination.
GAP can also be used to compare a polynucleotide or polypeptide of the present invention with a reference sequence. GAP uses the algorithm of Needleman and Wunsch (J. Mol. Biol. 48:443-453, 1970) to find the alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps. It allows for the provision of a gap creation penalty and a gap extension penalty in units of matched bases. GAP must make a profit of gap creation penalty number of matches for each gap it inserts. If a gap extension penalty greater than zero is chosen, GAP must, in addition, make a profit for each gap inserted of the length of the gap times the gap extension penalty. Default gap creation penalty values and gap extension penalty values in Version 10 of the Wisconsin Genetics Software Package for protein sequences are 8 and 2, respectively. For nucleotide sequences the default gap creation penalty is 50 while the default gap extension penalty is 3. The gap creation and gap extension penalties can be expressed as an integer selected from the group of integers consisting of from 0 to 200. Thus, for example, the gap creation and gap extension penalties can each independently be: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60, 65 or greater.
GAP presents one member of the family of best alignments. There may be many members of this family, but no other member has a better quality. GAP displays four figures of merit for alignments: Quality, Ratio, Identity, and Similarity. The Quality is the metric maximized in order to align the sequences. Ratio is the quality divided by the number of bases in the shorter segment. Percent Identity is the percent of the symbols that actually match. Percent Similarity is the percent of the symbols that are similar. Symbols that are across from gaps are ignored. A similarity is scored when the scoring matrix value for a pair of symbols is greater than or equal to 0.50, the similarity threshold. The scoring matrix used in Version 10 of the Wisconsin Genetics Software Package is BLOSUM62 (see Henikoff and Henikoff (1989) Proc. Natl. Acad. Sci. USA 89:10915).
Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using the BLAST 2.0 suite of programs using default parameters (Altschul et al., Nucleic Acids Res. 25:3389-3402, 1997; Altschul et al., J. Mol. Bio. 215:403-410, 1990) or to the value obtained using the GAP program using default parameters (see the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis., USA).
(c) As used herein, xe2x80x9csequence identityxe2x80x9d or xe2x80x9cidentityxe2x80x9d in the context of two nucleic acid or polypeptide sequences includes reference to the residues in the two sequences which are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g. charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences which differ by such conservative substitutions are said to have xe2x80x9csequence similarityxe2x80x9d or xe2x80x9csimilarityxe2x80x9d. Means for making this adjustment are well-known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., according to the algorithm of Meyers and Miller, Computer Applic. Biol. Sci., 4:11-17 (1988) e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, Calif., USA).
(d) As used herein, xe2x80x9cpercentage of sequence identityxe2x80x9d means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.
(e) (i) The term xe2x80x9csubstantial identityxe2x80x9d of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70% sequence identity, preferably at least 80%, more preferably at least 90% and most preferably at least 95%, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 60%, more preferably at least 70%, 80%, 90%, and most preferably at least 95%.
Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to each other under stringent conditions. However, nucleic acids which do not hybridize to each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This may occur, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code. One indication that two nucleic acid sequences are substantially identical is that the polypeptide which the first nucleic acid encodes is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.
(e) (ii) The terms xe2x80x9csubstantial identityxe2x80x9d in the context of a peptide indicates that a peptide comprises a sequence with at least 70% sequence identity to a reference sequence, preferably 80%, more preferably 85%, most preferably at least 90% or 95% sequence identity to the reference sequence over a specified comparison window. Optionally, optimal alignment is conducted using the homology alignment algorithm of Needleman and Wunsch, J. Mol. Biol. 48:443 (1970). An indication that two peptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a peptide is substantially identical to a second peptide, for example, where the two peptides differ only by a conservative substitution. Peptides which are xe2x80x9csubstantially similarxe2x80x9d share sequences as noted above except that residue positions which are not identical may differ by conservative amino acid changes.