This invention relates to polynucleotides believed to be novel, including partial, extended and full length sequences, as well as probes and primers, genetic constructs comprising the polynucleotides, biological materials incorporating the polynucleotides, polypeptides expressed by the polynucleotides, and methods for using the polynucleotides and polypeptides.
Sequencing of the genomes, or portions of the genomes, of numerous biological materials, including humans, animals, microorganisms and various plant varieties, has been and is being carried out on a large scale. Polynucleotides identified using sequencing techniques may be partial or full-length genes, and may contain open reading frames, or portions of open reading frames, that encode polypeptides. Putative polypeptides may be determined based on polynucleotide sequences. The sequencing data relating to polynucleotides thus represents valuable and useful information.
Polynucleotides may be analyzed for various degrees of novelty by comparing identified sequences to sequences published in various public domain databases, such as EMBL. Newly identified polynucleotides and putative polypeptides may also be compared to polynucleotides and polypeptides contained in public domain information to ascertain homology to known polynucleotides and polypeptides. In this way, the degree of similarity, identity or homology of polynucleotides and polypeptides of unknown function may be determined relative to polynucleotides and polypeptides having known functions.
Information relating to the sequences of isolated polynucleotides may be used in a variety of ways. Specified polynucleotides having a particular sequence may be isolated, or synthesized, for use in in vivo or in vitro experimentation as probes or primers. Alternatively, collections of sequences of isolated polynucleotides may be stored using magnetic or optical storage medium, and analyzed or manipulated using computer hardware and software, as well as other types of tools.
The present invention relates to polynucleotide sequences identified in the attached Sequence Listing as SEQ ID NOS: 1-35, variants of those sequences, extended sequences comprising the sequences set out in SEQ ID NOS: 1-35 and their variants, probes and primers corresponding to the sequences set out in SEQ ID NOS: 1-35 and their variants, polynucleotides comprising at least a specified number of contiguous residues of any of the polynucleotides identified as SEQ ID NOS: 1-35 (x-mers), and extended sequences comprising portions of the sequences set out in SEQ ID NOS: 1-35, all of which are referred to herein, collectively, as xe2x80x9cpolynucleotides of the present invention.xe2x80x9d
The polynucleotide sequences identified as SEQ ID NOS: 1-35 were derived from mammalian sources, namely, from mouse airways induced eosinophilia, rat dermal papilla and mouse stromal cells. Some of the polynucleotides of the present invention are xe2x80x9cpartialxe2x80x9d sequences, in that they do not represent a full-length gene encoding a full-length polypeptide. Such partial sequences may be extended by further analyzing and sequencing the EST clones from which the sequences were obtained, or by analyzing and sequencing various DNA libraries (e.g. cDNA or genoiic) using primers and/or probes and well known hybridization and/or PCR techniques. The partial sequences identified as SEQ ID NOS: 1-35 may thus be extended until an open reading frame encoding a polypeptide, a full-length polynucleotide and/or gene capable of expressing a polypeptide, or another useful portion of the genome is identified. Such extended sequences, including full-length polynucleotides and genes, are described as xe2x80x9ccorresponding toxe2x80x9d a sequence identified as one of the sequences of SEQ ID NOS: 1-35 or a variant thereof, or a portion of one of the sequences of SEQ ID NOS: 1-35 or a variant thereof, when the extended polynucleotide comprises an identified sequence or its variant, or an identified contiguous portion (x-mer) of one of the sequences of SEQ ID NOS: 1-35 or a variant thereof.
The polynucleotides identified as SEQ ID NOS: 1-35 were isolated from mouse and rat cDNA clones and represent sequences that are expressed in the tissue from which the cDNA was prepared. The sequence information may be used to isolate or synthesize expressible DNA molecules, such as open reading frames or fall-length genes, that can then be used as expressible or otherwise functional DNA in transgenic mammals and other organisms. Similarly, RNA sequences, reverse sequences, complementary sequences, anti-sense sequences and the like, corresponding to the polynucleotides of the present invention, may be routinely ascertained and obtained using the cDNA sequences identified as SEQ ID NOS: 1-35.
In a first aspect, the present invention provides isolated polynucleotide sequences comprising a polynucleotide selected from the group consisting of: (a) sequences recited in SEQ ID NO: 1-35; (b) complements of the sequences recited in SEQ ED NO: 1-35; (c) reverse complements of the sequences recited in SEQ ID NO: 1-35; (d) reverse sequences of the sequences recited in SEQ ID NO: 1-35; (e) sequences having either 40%, 60%, 75% or 90% identical nucleotides, as defined herein, to a sequence of (a)-(d); probes and primers corresponding to the sequences set out in SEQ ID NO: 1-35; polynucleotides comprising at least a specified number of contiguous residues of any of the polynucleotides identified as SEQ ID NO: 1-35; and extended sequences comprising portions of the sequences set out in SEQ ID NO: 1-35; all of which are referred to herein as xe2x80x9cpolynucleotides of the present inventionxe2x80x9d. The present invention also provides isolated polypeptide sequences identified in the attached Sequence Listing as SEQ ID NO: 36-65; polypeptide variants of those sequences; and polypeptides comprising the isolated polypeptide sequences and variants of those sequences.
In another aspect, the present invention provides genetic constructs comprising a polynucleotide of the present invention, either alone, or in combination with one or more additional polynucleotides of the present invention, or in combination with one or more known polynucleotides, together with cells and target organisms comprising such constructs.
The polynucleotides identified as SEQ ID NOS: 1-35 may contain open reading frames (xe2x80x9cORFsxe2x80x9d) or partial open reading frames encoding polypeptides. Additionally, open reading frames encoding polypeptides may be identified in extended or full-length sequences corresponding to the sequences set out as SEQ ID NOS: 1-35. Open reading frames may be identified using techniques that are well known in the art. These techniques include, for example, analysis for the location of known start and stop codons, most likely reading frame identification based on codon frequencies, etc. Suitable tools and software for ORF analysis are available, for example, on the Internet. Open reading frames and portions of open reading frames may be identified in the polynucleotides of the present invention. Once a partial open reading frame is identified, the polynucleotide may be extended in the area of the partial open reading frame using techniques that are well known in the art until the polynucleotide for the fall open reading frame is identified. Thus, polynucleotides and open reading frames encoding polypeptides may be identified using the polynucleotides of the present invention.
Once open reading frames are identified in the polynucleotides of the present invention, the open reading frames may be isolated and/or synthesized. Expressible DNA constructs may then be constructed that comprise the open reading frames and suitable promoters, initiators, terminators, etc., which are well known in the art. Such DNA constructs may be introduced into a host cell to express the polypeptide encoded by the open reading frame. Suitable host cells may include various prokaryotic and eukaryotic cells.
Polypeptides encoded by the polynucleotides of the present invention may be expressed and used in various assays to determine their biological activity. Such polypeptides may be used to raise antibodies, to isolate corresponding interacting proteins or other compounds, and to quantitatively determine levels of interacting proteins or other compounds.
In another aspect, the present invention provides isolated polypeptides encoded, or partially encoded, by the above polynucleotides. As used herein, the term xe2x80x9cpolypeptidexe2x80x9d encompasses amino acid chains of any length including full-length proteins, wherein the amino acid residues are linked by covalent peptide bonds. The term xe2x80x9cpolypeptide encoded by a polynucleotidexe2x80x9d as used herein, includes polypeptides encoded by a polynucleotide that comprises an isolated polynucleotide sequence or variant provided herein. Polypeptides of the present invention may be naturally purified products, or may be produced partially or wholly using recombinant techniques. Such polypeptides may be glycosylated with bacterial, fungal, mammalian or other eukaryotic carbohydrates or may be non-glycosylated. In specific embodiments, the inventive polypeptides comprise an amino acid sequence selected from the group consisting of SEQ ID NO: 36-65.
Polypeptides of the present invention may be produced recombinantly by inserting a polynucleotide sequence that encodes the polypeptide into a genetic construct and expressing the polypeptide in an appropriate host. Any of a variety of genetic constructs known to those of ordinary skill in the art may be employed. Expression may be achieved in any appropriate host cell that has been transformed or transfected with a genetic construct containing a polynucleotide that encodes a recombinant polypeptide. Suitable host cells include prokaryotes, yeast, and higher eukaryotic cells. Preferably, the host cells employed are Escherichia coli, insect, yeast, or a mammalian cell line such as COS or CHO. The polynucleotide sequences expressed in this manner may encode naturally occurring polypeptides, portions of naturally occurring polypeptides, or other variants thereof.
In a related aspect, polypeptides are provided that comprise at least a functional portion of a polypeptide having an amino acid sequence encoded by a polynucleotide of the present invention. As used herein, the xe2x80x9cfunctional portionxe2x80x9d of a polypeptide is that portion which contains the active site essential for affecting the function of the polypeptide, for example, the portion of the molecule that is capable of binding one or more reactants. The active site may be made up of separate portions present on one or more polypeptide chains and will generally exhibit high binding affinity.
Functional portions of a polypeptide may be identified by first preparing fragments of the polypeptide by either chemical or enzymatic digestion of the polypeptide, or by mutation analysis of the polynucleotide that encodes the polypeptide and subsequent expression of the resulting mutant polypeptides. The polypeptide fragments or mutant polypeptides are then tested to determine which portions retain biological activity, using, for example, the representative assays provided below.
Portions and other variants of the inventive polypeptides may also be generated by synthetic or recombinant means. Synthetic polypeptides having fewer than about 100 amino acids, and generally fewer than about 50 amino acids, may be generated using techniques well known to those of ordinary skill in the art. For example, such polypeptides may be synthesized using any of the commercially available solid-phase techniques, such as the Merrifield solid-phase synthesis method, where amino acids are sequentially added to a growing amino acid chain. See Merrifield, J. Am. Chem. Soc. 85:2149-2154, 1963. Equipment for automated synthesis of polypeptides is commercially available from suppliers such as Perkin Elmer/Applied BioSystems, Inc. (Foster City, Calif.), and may be operated according to the manufacturer""s instructions. Variants of a native polypeptide may be prepared using standard mutagenesis techniques, such as oligonucleotide-directed, site-specific mutagenesis (Kunkel, Proc. Natl. Acad. Sci. USA 82:488-492, 1985). Sections of polynucleotide sequence may also be removed using standard techniques to permit preparation of truncated polypeptides.
In general, the polypeptides disclosed herein are prepared in an isolated, substantially pure, form. Preferably, the polypeptides are at least about 80% pure, more preferably at least about 90% pure, and most preferably at least about 99% pure. In certain embodiments, described in detail below, the isolated polypeptides are incorporated into pharmaceutical compositions or vaccines.
The present invention also contemplates methods for modulating the polynucleotide and/or polypeptide content and composition of an organism, such methods involving stably incorporating into the genome of the organism a construct containing DNA of the present invention. In one embodiment, the target organism is a mammal, preferably a human, for example for human gene therapy. In a related aspect, a method for producing an organism having an altered genotype or phenotype is provided, the method comprising transforming a cell with a DNA construct of the present invention to provide a transgenic cell, and cultivating the transgenic cell under conditions conducive to regeneration and mature organism growth.
The isolated polynucleotides of the present invention have utility in genome mapping, in physical mapping, and in positional cloning of genes. Additionally, the polynucleotide sequences identified as SEQ ID NOS: 1-35 and their variants may be used to design oligonucleotide probes and primers. Oligonucleotide probes and primers have sequences that are substantially complementary to the polynucleotide of interest over a certain portion of the polynucleotide. Oligonucleotide probes designed using the polynucleotides of the present invention may be used to detect the presence and examine the expression patterns of genes in any organism having sufficiently similar DNA and RNA sequences in their cells using techniques that are well known in the art, such as slot blot DNA hybridization techniques. Oligonucleotide primers designed using the polynucleotides of the present invention may be used for PCR amplfications. Oligonucleotide probes and primers designed using the polynucleotides of the present invention may also be used in connection with various microarray technologies, including the microarray technology of Affymetrix (Santa Clara, Calif.).
The polynucleotides of the present invention may also be used to tag or identify an organism or reproductive material therefrom. Such tagging may be accomplished, for example, by stably introducing a non-disruptive non-functional heterologous polynucleotide identifier into an organism, the polynucleotide comprising one of the polynucleotides of the present invention.
Polynucleotides were isolated by high throughput sequencing of cDNA libraries prepared from mouse airway-induced eosinophilia, rat dermal papilla and mouse stromal cells as described below, in Example 1. Isolated polynucleotides of the present invention include the polynucleotides identified as SEQ ID NOS: 1-35; isolated polynucleotides comprising a polynucleotide sequence selected from the group consisting of SEQ ID NOS: 1-35; isolated polynucleotides comprising at least a specified number of contiguous residues (x-mers) of any of the polynucleotides identified as SEQ ID NOS: 1-35; polynucleotides complementary to any of the above polynucleotides; anti-sense sequences corresponding to any of the above polynucleotides; and variants of any of the above polynucleotides, as that term is described in this specification. The present invention also provides isolated polypeptide sequences identified in the attached Sequence Listing as SEQ ID NO: 36-65; polypeptide variants of those sequences; and polypeptides comprising the isolated polypeptide sequences and variants of those sequences.
The correspondence of isolated polynucleotides encoding isolated polypeptides of the present invention, and the functionality of the polypeptides, are shown, below, in Table 1.
The word xe2x80x9cpolynucleotide(s),xe2x80x9d as used herein, means a polymeric collection of nucleotides and includes DNA and corresponding RNA molecules and both single and double stranded molecules, including HnRNA and mRNA molecules, sense and anti-sense strands of DNA and RNA molecules, and comprehends cDNA, genomic DNA, and wholly or partially synthesized polynucleotides. An HnRNA, molecule contains introns and xe2x80x9ccorresponds toxe2x80x9d a DNA molecule in a generally one-to-one manner. An mRNA molecule xe2x80x9ccorresponds toxe2x80x9d an HnRNA and DNA molecule from which the introns have been excised. A polynucleotide of the present invention may be an entire gene, or any portion thereof. A gene is a DNA sequence which codes for a functional protein or RNA molecule. Operable anti-sense polynucleotides may comprise a fragment of the corresponding polynucleotide, and the definition of xe2x80x9cpolynucleotidexe2x80x9d therefore includes all operable anti-sense fragments. Anti-sense polynucleotides and techniques involving anti-sense polynucleotides are well known in the art and are described, for example, in Robinson-Benion et al., Methods in Enzymol. 254(23): 363-375, 1995 and Kawasaki et al., Artific. Organs 20 (8): 836-848, 1996.
Identification of genomic DNA and heterologous species DNA can be accomplished by standard DNA/DNA hybridization techniques, under appropriately stringent conditions, using all or part of a cDNA sequence as a probe to screen an appropriate library. Alternatively, PCR techniques using oligonucleotide primers that are designed based on known genomic DNA, cDNA and/or protein sequences can be used to amplify and identify genomic and cDNA sequences. Synthetic DNA corresponding to the identified sequences and variants may be produced by conventional synthesis methods. All of the polynucleotides described herein are isolated and purified, as those terms are commonly used in the art.
As used herein, the term xe2x80x9coligonucleotidexe2x80x9d refers to a relatively short segment of a polynucleotide sequence, generally comprising between 6 and 60 nucleotides, and comprehends both probes for use in hybridization assays and primers for use in the amplification of DNA by polymerase chain reaction.
As used herein, the term xe2x80x9cx-mer,xe2x80x9d with reference to a specific value of xe2x80x9cx,xe2x80x9d refers to a polynucleotide comprising at least a specified number (xe2x80x9cxxe2x80x9d) of contiguous residues of any of the polynucleotides identified as SEQ ID NOS: 1-35. The value of x may be from about 20 to about 600, depending upon the specific sequence.
As used herein, the term xe2x80x9cpolypeptidexe2x80x9d encompasses amino acid chains of any length, including full-length proteins, wherein amino acid residues are linked by covalent peptide bonds. Polypeptides of the present invention may be naturally purified products, or may be produced partially or wholly using recombinant techniques. Such polypeptides may be glycosylated with mammalian or other eukaryotic carbohydrates or may be non-glycosylated.
According to one embodiment, xe2x80x9cvariantsxe2x80x9d of the polynucleotides of the present invention, including the polynucleotides set forth as SEQ ID NOS: 1-35, as that term is used herein, comprehends polynucleotides producing an xe2x80x9cExe2x80x9d value of 0.01 or less, as described below, or having at least a specified percentage identity to a polynucleotide of the present invention, as described below. Polynucleotide variants of the present invention may be naturally occurring allelic variants, or non-naturally occurring variants.
Polynucleotide and polypeptide sequences may be aligned, and percentages of identical residues in a specified region may be determined against another polynucleotide or polypeptide, using computer algorithms that are publicly available. Two exemplary algorithms for aligning and identifying the similarity of polynucleotide sequences are the BLASTN and FASTA algorithms. Polynucleotides may also be analyzed using the BLASTX algorithm, which compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. The percentage identity of polypeptide sequences may be examined using the BLASTP algorithm The BLASTN, BLASTP and BLASTX algorithms are available on the NCBI anonymous FTP server and are available from the National Center for Biotechnology Information (NCBI), National Library of Medicine, Building 38A, Room 8N805, Bethesda, MD 20894, USA. The BLASTN algorithm Version 2.0.11 [Jan-20-2000], set to the parameters described below, is preferred for use in the determination of polynucleotide variants according to the present invention. The BLASTP algorithm, set to the parameters described below, is preferred for use in the determination of polypeptide variants according to the present invention. The use of the BLAST family of algorithms, including BLASTN, BLASTP and BLASTX, is described at NCBI""s website and in the publication of Altschul, et al., Nucleic Acids Res. 25: 3389-3402, 1997.
The FASTA and FASTX algorithms are available on the Internet, and from the University of Virginia by contacting David Hudson, Vice Provost for Research, University of Va., P.O. Box 9025, Charlottesville, Va. 22906-9025, USA. The FASTA algorithm, set to the default parameters described in the documentation and distributed with the algorithm, may be used in the determination of polynucleotide variants. The readme files for FASTA and FASTX Version 1.0xc3x97 that are distributed with the algorithms describe the use of the algorithms and describe the default parameters. The use of the FASTA and FASTX algorithms is described in Pearson and Lipman, Proc. Natl. Acad. Sci. USA 85:2444-2448, 1988; and Pearson, Methods in Enzymol. 183:63-98, 1990. The following running parameters are preferred for determination of alignments and similarities using BLASTN that contribute to the E values and percentage identity: Unix running command with default parameter values thus: blastall -p blastn -d embldb -e 10 -G 0 -E 0 -r 1 -v 30 -b 30 -i queryseq -o results; the Parameters are : -p Program Name [String]; -d Database [String]; -e Expectation value (E) [Real]; -G Cost to open a gap (zero invokes default behavior) [Integer]; -E Cost to extend a gap (zero invokes default behavior) [Integer]; -r Reward for a nucleotide match (BLASTN only) [Integer]; -v Number of one-line descriptions (V) [Integer]; -b Number of alignments to show (B) [Integer]; -i Query File [File In];-o BLAST report Output File [FileOut] Optional.
The xe2x80x9chitsxe2x80x9d to one or more database sequences by a queried sequence produced by BLASTN or FASTA or a similar algorithm align and identify similar portions of sequences. The hits are arranged in order of the degree of similarity and the length of sequence overlap. Hits to a database sequence generally represent an overlap over only a fraction of the sequence length of the queried sequence.
The BLASTN and FASTA algorithms produce xe2x80x9cExpectxe2x80x9d values for alignments. The Expect value (E) indicates the number of hits one can xe2x80x9cexpectxe2x80x9d to see over a certain number of contiguous sequences by chance when searching a database of a certain size. The Expect value is used as a significance threshold for determining whether the hit to a database, such as the preferred EMBL database, indicates true similarity. For example, an E value of 0.1 assigned to a hit is interpreted as meaning that in a database of the size of the EMBL database, one might expect to see 0.1 matches over the aligned portion of the sequence with a similar score simply by chance. The aligned and matched portions of the sequences, then, have a probability of 90% of being the same by this criterion. For sequences having an E value of 0.01 or less over aligned and matched portions, the probability of finding a match by chance in the EMBL database is 1% or less using the BLASTN or FASTA algorithnm.
According to one embodiment, xe2x80x9cvariantxe2x80x9d polynucleotides, with reference to each of the polynucleotides of the present invention, preferably comprise sequences having the same number or fewer nucleic acids than each of the polynucleotides of the present invention and producing an E value of 0.01 or less when compared to the polynucleotide of the present invention. That is, a variant polynucleotide is any sequence that has at least a 99% probability of being the same as the polynucleotide of the present invention, measured as having an E value of 0.01 or less using the BLASTN or FASTA algorithms set at the default parameters. According to a preferred embodiment, a variant polynucleotide is a sequence having the same number or fewer nucleic acids than a polynucleotide of the present invention that has at least a 99% probability of being the same as the polynucleotide of the present invention, measured as having an E value of 0.01 or less using the BLASTN or FASTA algorithms set at the default parameters.
Alternatively, variant polynucleotides of the present invention may comprise a sequence exhibiting at least about 40%, more preferably at least about 60%, more preferably yet at least about 75%, and most preferably at least about 90% similarity to a polynucleotide of the present invention, determined as described below. The percentage similarity is determined by aligning sequences using one of the BLASTN or FASTA algorithms, set at default parameters, and identifying the number of identical nucleic acids over the best aligned portion; dividing the number of identical nucleic acids by the total number of nucleic acids of the polynucleotide of the present invention; and then multiplying by 100 to determine the percentage similarity. For example, a polynucleotide of the present invention having 220 nucleic acids has a hit to a polynucleotide sequence in the EMBL database having 520 nucleic acids over a stretch of 23 nucleotides in the alignment produced by the BLASTN algorithm using the default parameters. The 23 nucleotide hit includes 21 identical nucleotides, one gap and one different nucleotide. The percentage similarity of the polynucleotide of the present invention to the hit in the EMBL library is thus 21/220 times 100, or 9.5%. The polynucleotide sequence in the EMBL database is thus not a variant of a polynucleotide of the present invention.
Alternatively, variant polynucleotides of the present invention hybridize to a polynucleotide of the present invention under stringent hybridization conditions. As used herein, xe2x80x9cstringent conditionsxe2x80x9d mean prewashing in a solution of 6xc3x97 SSC, 0.2% SDS; hybridizing at 65xc2x0 C., 6xc3x97 SSC, 0.2% SDS overnight; followed by two washes of 30 minutes each in 1xc3x97 SSC, 0.1% SDS at 65xc2x0 C. and two washes of 30 minutes each in 0.2xc3x97SSC, 0.1% SDS at 65xc2x0 C.
The present invention also encompasses allelic variants of the disclosed sequences, together with DNA sequences that differ from the disclosed sequences but which, due to the degeneracy of the genetic code, encode a polypeptide which is the same as that encoded by a DNA sequence disclosed herein. Thus, polynucleotides comprising sequences that differ from the polynucleotide sequences recited in SEQ ID NOS: 1-35, or complements, reverse sequences, or reverse complements of those sequences as a result of conservative substitutions are contemplated by and encompassed within the present invention. Additionally, polynucleotides comprising sequences that differ from the polynucleotide sequences recited in SEQ ID NOS: 1-35, or complements, reverse complements, or reverse sequences as a result of deletions and/or insertions totaling less than 10% of the total sequence length are also contemplated by and encompassed within the present invention.
The polynucleotides of the present invention may be isolated from various DNA libraries, or may be synthesized using techniques that are well known in the art. The polynucleotides may be synthesized, for example, using automated oligonucleotide synthesizers (e.g. Beckman Oligo 1000M DNA Synthesizer) to obtain polynucleotide segments of up to 50 or more nucleic acids. A plurality of such polynucleotide segments may then be ligated using standard DNA manipulation techniques that are well known in the art of molecular biology. One conventional and exemplary polynucleotide synthesis technique involves synthesis of a single stranded polynucleotide segment having, for example, 80 nucleic acids, and hybridizing that segment to a synthesized complementary 85 nucleic acid segment to produce a 5-nucleotide overhang. The next segment may then be synthesized in a similar fashion, with a 5-nucleotide overhang on the opposite strand. The xe2x80x9cstickyxe2x80x9d ends ensure proper ligation when the two portions are hybridized. In this way, a complete polynucleotide of the present invention may be synthesized entirely in vitro.
SEQ ID NOS: 2, 3, 5, 7-9, 11, 12, 14, 15, 17, 19-21, 23, 26, 28 and 30-32 are full-length sequences. The remaining polynucleotides are referred to as xe2x80x9cpartialxe2x80x9d sequences, in that they may not represent the full coding portion of a gene encoding a naturally occurring polypeptide. The partial polynucleotide sequences disclosed herein may be employed to obtain the corresponding full-length genes for various species and organisms by, for example, screening DNA expression libraries using hybridization probes based on the polynucleotides of the present invention, or using PCR amplification with primers based upon the polynucleotides of the present invention. In this way one can, using methods well known in the art, extend a polynucleotide of the present invention upstream and downstream of the corresponding mRNA, as well as identify the corresponding genomic DNA, including the promoter and enhancer regions, of the complete gene. The present invention thus comprehends isolated polynucleotides comprising a sequence identified in SEQ ID NOS: 1-35, or a variant of one of the specified sequences, that encode a functional polypeptide, including full-length genes. Such extended polynucleotides may have a length of from about 50 to about 4,000 nucleic acids or base pairs, and preferably have a length of less than about 4,000 nucleic acids or base pairs, more preferably yet a length of less than about 3,000 nucleic acids or base pairs, more preferably yet a length of less than about 2,000 nucleic acids or base pairs. Under some circumstances, extended polynucleotides of the present invention may have a length of less than about 1,800 nucleic acids or base pairs, preferably less than about 1,600 nucleic acids or base pairs, more preferably less than about 1,400 nucleic acids or base pairs, more preferably yet less than about 1,200 nucleic acids or base pairs, and most preferably less than about 1,000 nucleic acids or base pairs.
Polynucleotides of the present invention comprehend polynucleotides comprising at least a specified number of contiguous residues (x-mers) of any of the polynucleotides identified as SEQ ID NOS: 1-35 or their variants. According to preferred embodiments, the value of x is preferably at least 20, more preferably at least 40, more preferably yet at least 60, and most preferably at least 80. Thus, polynucleotides of the present invention include polynucleotides comprising a 20-mer, a 40-mer, a 60-mer, an 80-mer, a 100-mer, a 120-mer, a 150-mer, a 180-mer, a 220-mer a 250-mer, or a 300-mer, 400-mer, 500-mer or 600-mer of a polynucleotide identified as SEQ ID NOS: 1-35 or a variant of one of the polynucleotides identified as SEQ ID NOS: 1-35.
Polynucleotide probes and primers complementary to and/or corresponding to SEQ ID NOS: 1-35, and variants of those sequences, are also comprehended by the present invention. Such oligonucleotide probes and primers are substantially complementary to the polynucleotide of interest. An oligonucleotide probe or primer is described as xe2x80x9ccorresponding toxe2x80x9da polynucleotide of the present invention, including one of the sequences set out as SEQ ID NOS: 1-35 or a variant, if the oligonucleotide probe or primer, or its complement, is contained within one of tde sequences set out as SEQ ID NOS: 1-35 or a variant of one of the specified sequences.
Two single stranded sequences are said to be substantially complementary when the nucleotides of one strand, optimally aligned and compared using, for example, the BLAST algorithm as described above, with the appropriate nucleotide insertions and/or deletions, pair with at least 80%, preferably at least 90% to 95%, and more preferably at least 98% to 100%, of the nucleotides of the other strand. Alternatively, substantial complementarity exists when a first DNA strand will selectively hybridize to a second DNA strand under stringent hybridization conditions. Stringent hybridization conditions for determining complementarity include salt conditions of less than about 1 M, more usually less than about 500 mM and preferably less than about 200 mM. Hybridization temperatures can be as low as 5xc2x0 C., but are generally greater than about 22xc2x0 C., more preferably greater than about 30xc2x0 C. and most preferably greater than about 37xc2x0 C. Longer DNA fragments may require higher hybridization temperatures for specific hybridization. Since the stringency of hybridization may be affected by other factors such as probe composition, presence of organic solvents and extent of base mismatching, the combination of parameters is more important than the absolute measure of any one alone. The DNA from plants or samples or products containing plant material can be either genomic DNA or DNA derived by preparing cDNA from the RNA present in the sample.
In addition to DNA-DNA hybridization, DNA-RNA or RNA-RNA hybridization assays are also possible. In the case of DNA-RNA hybridization, the mRNA from expressed genes would then be detected instead of genomic DNA or cDNA derived from mRNA of the sample. In the case of RNA-RNA hybridization, RNA probes could be used. In addition, artificial analogs of DNA hybridizing specifically to target sequences could also be employed.
In specific embodiments, the oligonucleotide probes and/or primers comprise at least about 6 contiguous residues, more preferably at least about 10 contiguous residues, and most preferably at least about 20 contiguous residues complementary to a polynucleotide sequence of the present invention. Probes and primers of the present invention may be from about 8 to 100 base pairs in length or, preferably from about 10 to 50 base pairs in length or, more preferably from about 15 to 40 base pairs in length. The probes can be easily selected using procedures well known in the art, taking into account DNA-DNA hybridization stringencies, annealing and melting temperatures, potential for formation of loops and other factors, which are well known in the art. Tools and software suitable for designing probes, and especially suitable for designing PCR primers, are available on the Internet. Preferred techniques for designing PCR primers are also disclosed in Dieffenbach and Dyksler, PCR primer: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1995.
A plurality of oligonucleotide probes or primers corresponding to a polynucleotide of the present invention may be provided in a kit form. Such kits generally comprise multiple DNA or oligonucleotide probes, each probe being specific for a polynucleotide sequence. Kits of the present invention may comprise one or more probes or primers corresponding to a polynucleotide of the present invention, including a polynucleotide sequence identified in SEQ ID NOS: 1-35.
In one embodiment useful for high-throughput assays, the oligonucleotide probe kits of the present invention comprise multiple probes in an array format, wherein each probe is immobilized in a predefined, spatially addressable location on the surface of a solid substrate. Array formats which may be usefully employed in thle present invention are disclosed, for example, in U.S. Pat. Nos. 5,412,087, 5,545,531, and PCT Publication No. WO 95/00530, the disclosures of which are hereby incorporated by reference.
Oligonucleotide probes for use in the present invention may be constructed synthetically prior to immobilization on an array, using techniques well known in the art (see, for example, Oligonucleotide Synthesis: A Practical Approach, Gait, ed., IRL Press, Oxford, 1984). Automated equipment for the synthesis of oligonucleotides is available commercially from such companies as Perkin Elmer/Applied Biosystems Division (Foster City, Calif.) and may be operated according to the manufacturer""s instructions. Alternatively, the probes may be constructed directly on the surface of the array using techniques taught, for example, in PCT Publication No. WO 95/00530.
The solid substrate and the surface thereof preferably form a rigid support and are generally formed from the same material. Examples of materials from which the solid substrate may be constructed include polymers, plastics, resins, membranes, polysaccharides, silica or silica-based materials, carbon, metals and inorganic glasses. Synthetically prepared probes may be immobilized on the surface of the solid substrate using techniques well known in the art, such as those disclosed in U.S. Pat. No. 5,412,087.
In one such technique, compounds having protected fuctional groups, such as thiols protected with photochemically removable protecting groups, are attached to the surface of the substrate. Selected regions of the surface are then irradiated with a light source, preferably a laser, to provide reactive thiol groups. This irradiation step is generally performed using a mask having apertures at predefined locations using photolithographic techniques well known in the art of semiconductors. The reactive thiol groups are then incubated with the oligonucleotide probe to be immobilized. The precise conditions for incubation, such as temperature, time and pH, depend on the specific probe and can be easily determined by one of skill in the art. The surface of the substrate is washed free of unbound probe and the irradiation step is repeated using a second mask having a different pattern of apertures. The surface is subsequently incubated with a second, different, probe. Each oligonucleotide probe is typically immobilized in a discrete area of less than about 1 mm2. Preferably each discrete area is less than about 10,000 mm2, more preferably less than about 100 mm2. In this manner, a multitude of oligonucleotide probes may be immobilized at predefined locations on the array.
The resulting array may be employed to screen for differences in organisms or samples or products containing genetic material as follows. Genomic or cDNA libraries are prepared using techniques well known in the art. The resulting target DNA is then labeled with a suitable marker, such as a radiolabel, chromophore, fluorophore or chemiluminescent agent, using protocols well known for those skilled in the art. A solution of the labeled target DNA is contacted with the surface of the array and incubated for a suitable period of time.
The surface of the array is then washed free of unbound target DNA and the probes to which the target DNA hybridized are determined by identifying those regions of the array to which the markers are attached. When the marker is a radiolabel, such as 32P, autoradiography is employed as the detection method. In one embodiment, the marker is a fluorophore, such as fluorescein, and the location of bound target DNA is determined by means of fluorescence spectroscopy. Automated equipment for use in fluorescence scanning of oligonucleotide probe arrays is available from Affymetrix, Inc. (Santa Clara, Calif.) and may be operated according to the manufacturer""s instructions. Such equipment may be employed to determine the intensity of fluorescence at each predefined location on the array, thereby providing a measure of the amount of target DNA bound at each location. Such an assay would be able to indicate not only the absence and presence of the marker probe in the target, but also the quantitative amount as well.
In this manner, oligonucleotide probe kits of the present invention may be employed to examine the presence/absence (or relative amounts in case of mixtures) of polynucleotides in different samples or products containing different materials rapidly and in a cost-effective manner.
Another aspect of the present invention involves collections of a plurality of polynucleotides of the present invention. A collection of a plurality of the polynucleotides of the present invention, particularly the polynucleotides identified as SEQ ID NOS: 1-35, may be recorded and/or stored on a storage medium and subsequently accessed for purposes of analysis, comparison, etc. One utility for such sets of sequences is the analysis of the set, either alone or together with other sequences sets, for single nucleotide polymorphisms (SNPs) between sequences from different tissues and/or individuals for genetic studies, mapping and fingerprinting purposes. Suitable storage media include magnetic media such as magnetic diskettes, magnetic tapes, CD-ROM storage media, optical storage media, and the like. Suitable storage media and methods for recording and storing information, as well as accessing information such as polynucleotide sequences recorded on such media, are well known in the art. The polynucleotide information stored on the storage medium is preferably computer-readable and may be used for analysis and comparison of the polynucleotide information.
Another aspect of the present invention thus involves storage medium on which are recorded a collection of the polynucleotides of the present invention, particularly a collection of the polynucleotides identified as SEQ ID NOS: 1-35. According to one embodiment, the storage medium includes a collection of at least 20, preferably at least 50, more preferably at least 100, and most preferably at least 200 of the polynucleotides of the present invention, preferably the polynucleotides identified as SEQ ID NOS: 1-35, or variants of those polynucleotides.
Another aspect of the present invention involves a combination of polynucleotides, the combination containing at least 5, preferably at least 10, more preferably at least 20, and most preferably at least 50 different polynucleotides of the present invention, including polynucleotides selected from SEQ ID NOS: 1-35, or variants of these polynucleotides.
In another aspect, the present invention provides DNA constructs comprising, in the 5xe2x80x2-3xe2x80x2 direction, a gene promoter sequence; an open reading frame coding for at least a functional portion of a polypeptide encoded by a polynucleotide of the present invention; and a gene termination sequence. The open reading frame may be orientated in either a sense or antisense direction. DNA constructs comprising a non-coding region of a gene coding for an enzyme encoded by the above DNA sequences or a nucleotide sequence complementary to a non-coding region, together with a gene promoter sequence and a gene termination sequence, are also provided. Preferably, the gene promoter and termination sequences are functional in a host cell. More preferably, the gene promoter and termination sequences are common to those of the polynucleotide being introduced. Other promoter and termination sequences generally used in the art, such as the Cauliflower Mosaic Virus (CMV) promoter, with or without enhancers, such as the Kozak sequence or Omega enhancer, and Agrobacterium tumefaciens nopalin synthase terminator may be usefuly employed in the present invention. Tissue-specific promoters may be employed in order to target expression to one or more desired tissues. The DNA construct may further include a marker for the identification of transformed cells.
Techniques for operatively liking the components of the DNA constructs are well known in the art and include the use of synthetic linkers containing one or more restriction endonuclease sites as described, for example, by Sambrook et al., Molecular Cloning: a laboratory manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989. The DNA constructs of the present invention may be linked to a vector having at least one replication system, for example, Escherichia coli, whereby after each manipulation, the resulting construct can be cloned and sequenced and the correctness of the manipulation determined.
Transgenic cells comprising the DNA constructs of the present invention are provided, together with organisms comprising such transgenic cells. Techniques for stably incorporating DNA constructs into the genome of target organisms, such as mammals, are well known in the art and include electroporation, protoplast fusion, injection into reproductive organs, injection into immature embryos, high velocity projectile introduction and the like. The choice of technique will depend upon the target organism to be transformed. In one embodiment, naked DNA is injected or delivered orally. Once the cells are transformed, cells having the DNA construct incorporated in their genome are selected. Transgenic cells may then be cultured in an appropriate medium using techniques well known in the art.
In yet a further aspect, the present invention provides methods for modifying the level (concentration) or activity of a polypeptide in a host organism comprising stably incorporating into the genome of the organism a DNA construct of the present invention. The DNA constructs of the present invention may be used to transform a variety of organisms, including mammals, for example to make experimental gene knock out or transgenic annals.
Further, the polynucleotides of the present invention have particular application for use as non-disruptive tags for marking organisms, including commercially valuable animals, fish, bacteria and yeasts. DNA constructs comprising polynucleotides of the present invention may be stably introduced into an organism as heterologous, non-functional, non-disruptive tags. It is then possible to identify the origin or source of the organism at a later date by determining the presence or absence of the tag(s) in a sample of material.
Detection of the tag(s) may be accomplished using a variety of conventional techniques, and will generally involve the use of nucleic acid probes. Sensitivity in assaying the presence of probe can be usefully increased by using branched oligonucleotides, as described by Horn et al., Nucleic Acids Res. 25(23):4842-4849, 1997, enabling to detect as few as 50 DNA molecules in the sample.
In particular, the polynucleotides of the present invention encode polypeptides that have important roles in processes such as induction of growth differentiation of tissue-specific cells, cell migration, cell proliferation, and cell-cell interaction. These polypeptides are important in the maintenance of tissue integrity, and thus are important in processes such as wound healing. Some of these polypeptides act as modulators of immune responses, such as immunologically active polypeptides for the benefit of offspring. In addition, many polypeptides are immunologically active, making them important therapeutic targets in a whole range of disease states. Antibodies to the polypeptides of the present invention and small molecule inhibitors related to the polypeptides of the present invention may also be used for modulating immune responses and for treatment of diseases according to the present invention.
SEQ ID NOS: 1; 2; 4; 5; 6; 8; 9; 11; 12; 14; 17; 19-24; 26; 27; 31-34 encode secreted polypeptides. SEQ ID NOS: 10; 15; 16; 18; 25; 28; 30; and 35 encode polypeptides acting as receptors. SEQ ID NOS: 2; 4; 24; 29 and 35 encode polypeptides with cell signaling activity, which may be either intracellular or extracellular. Kinase genes, for example, encode polypeptides that phosphorylate specific substrates during cell-to-cell signaling. While some kinases are involved in normal metabolism and nucleotide production, others are significant for altering the activity of many cellular processes through the phosphorylation of specific proteins. Polypeptides encoded by these genes are important in the transmission of intracellular signals resulting from the binding of extracellular ligands such as hormones, growth factors or cytokines to membrane-bound receptors. The utility of polynucleotides encoding kinases resides in the manipulation of their signaling activities and downstream effects for the diagnosis and treatment of mammalian diseases that may be a consequence of inappropriate expression of these kinase genes.
SEQ ID NOS: 2 and 4 encode polypeptides with cytokine activity. Cytokine or growth factor polynucleotides encode polypeptides involved in intercellular signaling and represent another important class of molecules. Polynucleotides encoding such genes have utility in the diagnosis and treatment of disease.
SEQ ID NOS: 7; 11; 12; 15 and 22 encode polypeptides with transcription factor activity. These polynucleotides encode polypeptides required for the control of synthesis of proteins in tissue specific manner and have utility for the modification of protein synthesis for the control of disease.
SEQ ID NOS: 8 encode polypeptides acting in the extracellular matrix.
SEQ ID NOS: 11; 12; 15 and 22 encode polypeptides with RNA synthesis activities.
SEQ ID NO: 12 encodes a polypeptide having CD antigen activity. Such polynucleotides have utility as modulators of the composition, expression level and class of CD antigen expressed, which influence immune responses to self-antigens, neo-antigens and infectious agents.
Further exemplary specific utilities, for exemplary polynucleotides of the present invention, are specified in the Table below.