The present invention relates to genes encoding novel human proteins which exhibit a variety useful biological activities. More specifically, isolated nucleic acid molecules are provided which encode polypeptides comprising various forms of human proteins. Human polypeptides are also provided, as are vectors, host cells and recombinant methods for producing the same. Also provided are methods for detecting nucleic acids or polypeptides related to those of the invention, for example, to aid in identification of a biological sample or diagnosis of disorders related to expression of protein genes of this invention. The invention further relates to methods for identifying agonists and antagonists of the proteins of the invention, as well as to methods for treatment of disorders related to protein gene expression using polypeptides, antagonists and agonists of the invention.
Identification and sequencing of human genes is a major goal of modem scientific research. For example, by identifying genes and determining their sequences, scientists have been able to make large quantities of valuable human gene products. These include human insulin, interferon, Factor VIII, human growth hormone, tissue plasminogen activator, erythropoeitin and numerous other proteins. Additionally, knowledge of gene sequences can provide keys to diagnosis, treatment or cure of genetic diseases such as muscular dystrophy and cystic fibrosis.
Despite the great progress that has been made in recent years, only a small number of genes which encode the presumably thousands of human proteins have been identified and sequenced. Therefore, there is a need for identification and characterization of novel human proteins and corresponding genes which can play a role in detecting, preventing, ameliorating or correcting disorders related to abnormal expression of and responses to such proteins.
The present invention provides isolated nucleic acid molecules comprising polynucleotide sequences which have been identified as sequences encoding human proteins of the invention. Each protein of the invention is identified in Table 1, below (see Example 2) by a reference number designated as a xe2x80x9cProtein ID (Identifier)xe2x80x9d (e.g., xe2x80x9cPF353-01xe2x80x9d). Each protein of the invention is related to a human complementary DNA (cDNA) clone prepared from a messenger RNA (MRNA) encoding the related protein. The cDNA clone related to each protein of the invention is identified by a xe2x80x9ccDNA Clone ID (Identifier)xe2x80x9d in Table 1 (e.g., xe2x80x9cHABCE99xe2x80x9d). DNA of each CDNA clone in Table 1 is contained in the material deposited with the American Type Culture Collection and given the ATCC Deposit Number shown for each cDNA Clone ID in Table 1, as further described below.
The invention provides a nucleotide sequence determined for an mRNA molecule encoding each protein identified in Table 1, which is designated in Table 1 as the xe2x80x9cTotal NT (Nucleotide) Sequence.xe2x80x9d This determined nucleotide sequence has been assigned a SEQ ID NO=xe2x80x9cXxe2x80x9d in the Sequence Listing hereinbelow, where the value of X for the determined nucleotide sequence of each protein is an integer specified in Table 1. The determined nucleotide sequence provided for each protein of the invention was determined by applying conventional automated nucleotide sequencing methods to DNA of the corresponding deposited cDNA clone cited in Table 1.
The determined nucleotide sequence for the mRNA encoding each protein of the invention has been translated to provide a determined amino acid sequence for each protein which is identified in Table 1 by a SEQ ID NO=xe2x80x9cYxe2x80x9d where the value of Y for each protein is an integer defined in Table 1. The determined amino acid sequence for each protein represents the amino acid sequence encoded by the determined nucleotide sequence, beginning at or near the translation initiation (xe2x80x9cstartxe2x80x9d) codon of the protein and continuing until the first translation termination (xe2x80x9cstopxe2x80x9d) codon. Due to possible errors inherent in determining nucleotide sequences from any DNA molecule, particularly using the conventional automated sequencing technology used to sequence the cDNA clones described herein, occasional nucleotide sequence errors are expected in the determined nucleotide sequences of the invention. These errors may include insertions or deletions of one or a few nucleotides in the determined nucleotide sequence as compared to the actual nucleotide sequence of the deposited cDNA. As one of ordinary skill would appreciate, incorrect insertions or deletions of one or two nucleotides into a determined nucleotide sequence leads to a shift in the translation reading frame compared to the reading frame actually encoded by a cDNA clone. Further, such a shift in frame within an actual open reading frame frequently leads to the appearance of a translation termination (stop) codon within the sequence encoding the polypeptide. Accordingly, due to occasional errors in the nucleotide sequences determined from the deposited cDNAs and any related DNA clones used to prepare the determined sequence for the mRNA encoding each secreted protein of the invention, the translations shown as determined amino acid sequences in SEQ ID NO:Y may represent only a portion of the complete amino acid sequence of the human secreted protein actually encoded by the mRNA represented by the corresponding cDNA clone in the ATCC deposit identified in Table 1. In any event, the determined amino acid sequence for each protein in Table 1, which is shown in SEQ ID NO:Y for each protein, comprises at least a portion of the amino acid sequence determined for that protein.
More particularly, the determined amino acid sequence is the amino acid sequence translated from the determined nucleotide sequence in the open reading frame of the first amino acid of the ORF to the last amino acid of that frame. In other words, the determined amino acid sequence is translated from the determined nucleotide sequence beginning at the codon having as its 5xe2x80x2 end the nucleotide in the position of SEQ ID NO:X identified in Table 1 as the 5xe2x80x2 nucleotide of the first amino acid (abbreviated in Table 1 as xe2x80x9c5xe2x80x2 NT of First AAxe2x80x9d). Translation of the determined nucleotide sequence is continued in the reading frame of that first amino acid codon to the first stop codon in that same open reading frame, i.e., to the position in SEQ ID NO:X which encodes the amino acid at the position in SEQ ID NO:Y identified as the xe2x80x9clast amino acid of the open reading framexe2x80x9d (abbreviated as xe2x80x9cLast AA of ORFxe2x80x9d).
For any determined amino acid sequence in which the first amino acid is the methionine encoded by the translation initiation codon for the protein, Table 1 also identifies the position in SEQ ID NO:X of the 5xe2x80x2 nucleotide of the start codon (xe2x80x9c5xe2x80x2 NT of Start Codonxe2x80x9d) as the same position in SEQ ID NO:X as that of the 5xe2x80x2 nucleotide of the first amino acid (xe2x80x9cFirst AAxe2x80x9d).
Table 1 also identifies the positions in SEQ ID NO:Y of the last amino acid of the signal peptide (xe2x80x9cLast AA of Sig Pepxe2x80x9d) and the first amino acid of the secreted portion (xe2x80x9cFirst AA of Secreted Portionxe2x80x9d) of the protein, for those polypeptide having a secretory leader sequence. The xe2x80x9csecreted portionxe2x80x9d of a secreted protein in the present context indicates that portion of the complete polypeptide translated from an mRNA which remains after cleavage of the signal peptide by a signal peptidase. In this context the term xe2x80x9cmaturexe2x80x9d may also be used interchangeably with xe2x80x9csecreted portionxe2x80x9d although it is recognized that in other contexts xe2x80x9cmaturexe2x80x9d may designate a portion of a xe2x80x9cproproteinxe2x80x9d which is produced by further cleavage of the polypeptide after cleavage of the signal peptide.
Accordingly, in one aspect the invention provides an isolated nucleic acid molecule comprising a nucleotide sequence which is identical to the nucleotide sequence of SEQ ID NO:X, where X is any integer as defined in Table 1. The invention also provides an isolated nucleic acid molecule comprising a nucleotide sequence which is identical to a portion of the nucleotide sequence of SEQ ID NO:X, for instance, a sequence of at least 50, 100 or 150 contiguous nucleotides in the nucleotide sequence of SEQ ID NO:X. Such a portion of the nucleotide sequence of SEQ ID NO:X may be described most generally as a sequence of at least C contiguous nucleotides in the nucleotide sequence of SEQ ID NO:X where: (1) the sequence of at least C contiguous nucleotides begins with the nucleotide at position N of SEQ ID NO:X and ends with the nucleotide at position M of SEQ ID NO:X; (2) C is any integer in the range beginning with a convenient primer size, for instance, about 20, to the total nucleotide sequence length (xe2x80x9cTotal NT Seq.xe2x80x9d) as set forth for SEQ ID NO:X in Table 1; (3) N is any integer in the range of 1 to the first position of the last C nucleotides in SEQ ID NO:X, or more particularly, N is equal to the value of Total NT Seq. minus the quantity C plus 1 (i.e., Total NT Seq.xe2x88x92(C+1)); and (4) M is any integer in the range of C to Total NT Seq.
Preferably, the sequence of contiguous nucleotides in the nucleotide sequence of SEQ ID NO:X is included in SEQ ID NO:X in the range of positions beginning with the nucleotide at about the 5xe2x80x2 nucleotide of the clone sequence (xe2x80x9c5xe2x80x2 NT of Clone Seq.xe2x80x9d in Table 1) and ending with the nucleotide at about the 3xe2x80x2 nucleotide of the clone sequence (xe2x80x9c3xe2x80x2 NT of Clone Seq.xe2x80x9d in Table 1). More preferably, the sequence of contiguous nucleotides is in the range of positions beginning with the nucleotide at about the position of the 5xe2x80x2 Nucleotide of the Start Codon (xe2x80x9c5xe2x80x2 NT of Start Codonxe2x80x9d in Table 1) and ending with the nucleotide at about the position of the 3xe2x80x2 Nucleotide of the Clone Sequence as set forth for SEQ ID NO:X in Table 1. For instance, one preferred embodiment of this aspect of the invention is an isolated nucleic acid molecule which comprises a sequence at least 95%, 96%, 97%, 98%, or 99% identical to a sequence of about 500 contiguous nucleotides included in the nucleotide sequence of SEQ ID NO:X beginning at about the 5xe2x80x2 NT of Start Codon position as set forth for SEQ ID NO:X in Table 1. Another preferred embodiment of this aspect of the invention is a nucleic acid molecule comprising a nucleotide sequence which is at least 95% identical to the nucleotide sequence of SEQ ID NO:X beginning with the nucleotide at about the position of the 5xe2x80x2 Nucleotide of the First Amino Acid of the Signal Peptide and ending with the nucleotide at about the position of the 3xe2x80x2 Nucleotide of the Clone Sequence as defined for SEQ ID NO:X in Table 1.
Further embodiments of the invention include isolated nucleic acid molecules which comprise a nucleotide sequence at least 90% identical, and more preferably at least 95%, 96%, 97%, 98%, 99% or 99.9% identical, to any of the determined nucleotide sequences above. For instance, one such embodiment is an isolated nucleic acid molecule comprising a nucleotide sequence which is at least 95% identical to a sequence of at least 50 contiguous nucleotides in the nucleotide sequence of SEQ ID NO:X wherein X is any integer as defined in Table 1. Another embodiment of this aspect of the invention is an isolated nucleic acid molecule comprising a nucleotide sequence which is at least 95% identical to the complete nucleotide sequence of SEQ ID NO:X.
Isolated nucleic acid molecules which hybridize under stringent hybridization conditions to a nucleic acid molecule described above also are provided. Such a nucleic acid molecule which hybridizes does not hybridize under stringent hybridization conditions to a nucleic acid molecule having a nucleotide sequence consisting of only A residues or of only T residues.
The invention further provides a composition of matter comprising a nucleic acid molecule which comprises a human cDNA clone identified by a cDNA Clone ID (Identifier) in Table 1, which DNA molecule is contained in the material deposited with the American Type Culture Collection and given the ATCC Deposit Number shown in Table 1 for that cDNA clone. As described further in Example 1, this deposited material comprises a mixture of plasmid DNA molecules containing cloned cDNAs of the invention. Further, the invention provides an isolated nucleic acid molecule comprising a nucleotide sequence which is, for instance, at least 95% identical to a sequence of at least 50, 150 or 500 contiguous nucleotides in the nucleotide sequence encoded by a human cDNA clone contained in the deposit given the ATCC Deposit Number shown in Table 1. One preferred embodiment of this aspect is an isolated nucleic acid molecule comprising a nucleotide sequence which is at least 95% identical to the complete nucleotide sequence encoded by a human cDNA clone identified in Table 1 and as contained in the deposit with the ATCC Deposit Number shown in Table 1. Also provided are isolated nucleic acid molecules which hybridize under stringent hybridization conditions to a nucleic acid molecule comprising a nucleotide sequence encoded by a human cDNA clone identified in Table 1 and contained in the cited deposit.
These nucleic acid molecules of the invention may be used for a variety of identification and diagnostic purposes. For instance, the invention provides a method for detecting in a biological sample a nucleic acid molecule comprising a nucleotide sequence which is at least 95% identical to a sequence of at least 50 contiguous nucleotides in a nucleotide sequence of the invention. The sequence of the nucleic acid molecule used in this method is selected from the group consisting of: a nucleotide sequence of SEQ ID NO:X wherein X is any integer as defined in Table 1; and a nucleotide sequence encoded by a human cDNA clone identified by a cDNA Clone Identifier in Table 1 and contained in the deposit with the ATCC Deposit Number shown for said cDNA clone in Table 1. This method of the invention comprises a step of comparing a nucleotide sequence of at least one nucleic acid molecule in the biological sample with a sequence selected from the group above, and determining whether the sequence of the nucleic acid molecule in the sample is at least 95% identical to the selected sequence. The step of comparing sequences may comprise determining the extent of nucleic acid hybridization between nucleic acid molecules in the sample and a nucleic acid molecule comprising the sequence selected from the above group. Alternatively, this step may be performed by comparing the nucleotide sequence determined from a nucleic acid molecule in the sample, for instance by automated DNA sequence methods, with the sequence selected from the above group.
In another aspect, the invention provides methods for identifying the species, tissue or cell type of a biological sample based on detecting nucleic acid molecules in the sample which comprise a nucleotide sequence of a nucleic acid molecule of the invention (for instance, a nucleic acid molecule comprising a nucleotide sequence that is at least 95% identical to at least a portion of a nucleotide sequence of SEQ ID NO:X or a nucleotide sequence encoded by a human cDNA clone identified in Table 1 as contained in the deposit with the ATCC Deposit Number shown therein. This method may be conducted by detecting a nucleotide sequence of an individual cDNA of the invention or using panel of nucleotide sequences of the invention. Thus, this method may comprise a step of detecting nucleic acid molecules comprising a nucleotide sequence in a panel of at least two nucleotide sequences, where at least one sequence in the panel is at least 95% identical to at least a portion of a nucleotide sequence of SEQ ID NO:X or a nucleotide sequence encoded by a human cDNA clone contained in the ATCC deposit. In this method for identifying the species, tissue or cell type of a biological sample, the detection of nucleic acid molecules comprising nucleotide sequences of the invention may be conducted by various techniques known in the art including, for instance, hybridization of either DNA or RNA probes to either DNA or RNA molecules obtained from the biological sample, as well as computational comparisons of nucleotide sequences determined from nucleic acids in a biological sample with nucleotide sequences of the invention.
Similarly, nucleic acid molecules of the invention may be used in a method for diagnosing in a subject a pathological condition associated with abnormal structure or expression of a gene encoding a protein identified in Table 1. This method may comprise a step of detecting in a biological sample obtained from the subject nucleic acid molecules comprising a nucleotide sequence that is at least 95% identical to at least a portion of a nucleotide sequence of SEQ ID NO:X or a nucleotide sequence encoded by a human cDNA clone identified in Table 1 as contained in the deposit with the given ATCC Deposit Number. Again, this diagnostic method may involve analysis of individual nucleotide sequences or panels of several nucleotide sequences, and the analysis of either DNA or RNA species using either DNA or RNA probes.
For use in identification or diagnostic methods such as those described above, therefore, the invention also provides a composition of matter comprising isolated nucleic acid molecules in which the nucleotide sequences of the nucleic acid molecules comprise a panel of sequences, at least one of which is at least 95% identical to a sequence, either a nucleotide sequence of SEQ ID NO:X or a nucleotide sequence encoded by a human cDNA clone contained in the ATCC deposit in Table 1. In this composition, the nucleic acid molecules may comprise DNA molecules or RNA molecules or both, as well as polynucleotide equivalents of DNA and RNA which are not naturally occurring but are known in the art as such.
Another aspect of the invention relates to polypeptides comprising amino acid sequences encoded by nucleotide sequences of the invention. For identification and diagnostic purposes, these polypeptides need not include the amino acid sequence of a complete secreted protein or even of the secreted form of such a protein, since, for instance, antibodies may bind specifically to a linear epitope of a polypeptide which comprises as few as 6 to 8 amino acids. Accordingly, the invention also provides an isolated polypeptide comprising an amino acid sequence at least 90%, preferably 95%, 96%, 97%, 98%, or 99% identical to a sequence of at least about 10, 30 or 100 contiguous amino acids in the amino acid sequence of SEQ ID NO:Y wherein Y is any integer as defined in Table 1. Preferably, the sequence of contiguous amino acids is included in the amino acid sequence of SEQ ID NO:Y beginning with the residue at about the position of the First Amino Acid of the Secreted Portion where one exists or the first amino acid of the open reading frame if the protein is not indicated as having a signal peptide and ending with the residue at about the Last Amino Acid of the Open Reading Frame as set forth for SEQ ID NO:Y in Table 1. A preferred embodiment of this aspect relates to an isolated polypeptide comprising an amino acid sequence at least 95% identical to the complete amino acid sequence of SEQ ID NO:Y.
As noted above, however, the determined amino acid sequence of SEQ ID NO:Y may not include the complete amino acid sequence of the protein encoded by each cDNA in the ATCC deposit identified in Table 1. Accordingly, the invention further provides an isolated polypeptide comprising an amino acid sequence at least 90% identical, preferably at least 95%, 96%, 97%, 98% or 99% identical to a sequence of at least about 10, 300 or 100 contiguous amino acids in the complete amino acid sequence of a secreted protein encoded by a human cDNA clone identified by a cDNA Clone Identifier in Table 1 and contained in the deposit with the ATCC Deposit Number shown for that cDNA clone in Table 1. A particularly preferred embodiment of this aspect is a polypeptide in which the sequence of contiguous amino acids is included in the amino acid sequence of a secreted (xe2x80x9cmaturexe2x80x9d) portion of the protein encoded by a human cDNA clone contained in the deposit, particularly a polypeptide comprising the entire amino acid sequence of the secreted portion of the secreted protein encoded by a human cDNA clone of the invention.
For purposes such as tissue identification and diagnosis of pathological conditions, the invention also provides an isolated antibody which binds specifically to a polypeptide comprising an amino acid sequence of the invention, (for instance, a sequence that is identical to a sequence of at least 6, preferably at least 7, 8, 9 or 10, contiguous amino acids in an amino acid sequence of SEQ ID NO:Y or in a complete amino acid sequence of a protein encoded by a human cDNA clone identified by a cDNA Clone Identifier in Table 1 and contained in the deposit cited therein. Further in the same vein, the invention provides a method for detecting in a biological sample a polypeptide comprising an amino acid sequence which is identical to a sequence of at least 6, preferably at least 7, 8, 9 or 10 contiguous amino acids in a sequence selected from the group consisting of an amino acid sequence of SEQ ID NO:Y and a complete amino acid sequence of a protein encoded by a human cDNA clone identified by a cDNA Clone Identifier in Table 1 and contained in the deposit with the ATCC Deposit Number shown for that cDNA clone in Table 1;. This method comprises a step of comparing an amino acid sequence of at least one polypeptide molecule in said sample with a sequence selected from the above group and determining whether the sequence of that polypeptide molecule in the sample is identical to the selected sequence of at least 6-10 contiguous amino acids. This step of comparing an amino acid sequence of at least one polypeptide molecule in the sample with a sequence selected from the above group may comprise determining the extent of specific binding of polypeptides in the sample to an antibody which binds specifically to a polypeptide comprising an amino acid sequence of the invention. Alternatively, this comparison step may be performed by comparing the amino acid sequence determined from a polypeptide molecule in the sample with the sequence selected from the above group, for instance, using computational methods.
The invention further provides methods for identifying the species, tissue or cell type of a biological sample comprising a step of detecting polypeptide molecules in the sample which include an amino acid sequence that is identical to a sequence of at least 6-10 contiguous amino acids an amino acid sequence of SEQ ID NO:Y or of a cDNA identified in Table 1 and contained in the cited deposit. This method may involve analyses of polypeptides for the presence of individual amino acid sequences of the invention or of panels of such sequences. Similarly provided are methods for diagnosing in a subject a pathological condition associated with abnormal structure or expression of a gene encoding a protein identified in Table 1. In preferred embodiments of these methods of the invention for identification or diagnosis, an antibody which binds specifically to a polypeptide comprising an amino acid sequence of the invention is used to analyze amino acid sequences of polypeptides in a biological sample.
In yet another aspect, the invention provides recombinant means for making a polypeptide comprising all or a portion of an amino acid sequence of the invention. For this purpose, an isolated nucleic acid molecule comprising a nucleotide sequence which is, for instance, at least 95% identical to a nucleotide sequence encoding a polypeptide which comprises an amino acid sequence of the invention (for instance, one that is at least 90% identical to SEQ ID NO:Y.
It will be readily appreciated by one of ordinary skill that, due to the degeneracy of the genetic code, any nucleotide sequence encoding the amino acid sequence of a given protein needs to share only a low level of identity with the nucleotide sequence of a human cDNA clone which encodes the identical amino acid sequence of that protein. It will be further appreciated that the nucleotide of the deposited cDNAs presumably all comprise codons optimized for expression by human cells from which the cDNAs originated. Therefore, for improved expression in recombinant prokaryotic host cells, for instance, it may be desirable to alter the codon usage in a nucleic acid molecule encoding an amino acid sequence of the invention, selecting codons in accordance with the redundancy of the genetic code, which provide optimal codon usage in the selected host. Preferred nucleic acid molecules of this aspect of the invention are those which encode a polypeptide which comprises an complete amino acid sequence of SEQ ID NO:Y or a complete amino acid sequence of a protein encoded by a human cDNA clone identified in Table 1 and contained in the deposit cited therein.
Using such nucleic acid molecules encoding polypeptides of the invention, the invention further provides recombinant means for making the polypeptides. Thus, included is a method of making a recombinant vector comprising inserting an isolated nucleic acid molecule of the invention into a vector, as well as a recombinant vector produced by this method. Also included is a method of making a recombinant host cell comprising introducing a vector of the invention into a host cell, and a recombinant host so made. Such cells are useful, for instance, in a method of making an isolated polypeptide of the invention which comprises culturing a recombinant host cell under conditions such that the polypeptide is expressed and recovering the polypeptide.
In a preferred embodiment of this method, the recombinant host cell is a eukaryotic cell and the polypeptide encoded by the nucleic acid of the invention encodes the complete amino acid sequence of a protein encoded by a cDNA identified in Table 1, so that the polypeptide produced by this method is a secreted (xe2x80x9cmaturexe2x80x9d) portion of a human secreted protein of the invention (i.e., one comprising an amino acid sequence of SEQ ID NO:Y beginning with the residue at the position identified in Table 1 as the First AA of Secreted Portion of SEQ ID NO:Y or an amino acid sequence of a secreted portion of a secreted protein encoded by a human cDNA clone identified in Table 1 and contained in the deposit with the ATCC Deposit Number shown in Table 1. The invention further provides an isolated polypeptide which is a secreted portion of a human secreted protein produced by the above method. Where the polypeptide shown in Table 1 does not have a leader sequence one may be provided by the vector. Such vectors are known in the art and are discussed below.
In yet another aspect, the invention provides a method of treatment of an individual in need of an increased level of a secreted protein activity. As described herein, diagnostic methods of the invention enable the identification of such individuals, that is, individuals with a pathological condition involving a particular organ, tissue or cell type, exhibiting lower levels of expression product (e.g., mRNA or antigen) of a given secreted protein in that organ, tissue or cell type, or those with mutant expression products, compared with normal individuals not suffering from the pathology. The method of the invention for treatment of an individual with such a pathological condition comprises administering to such an individual a pharmaceutical composition comprising an amount of an isolated polypeptide of a secreted protein of the invention effective to increase the level of activity of that secreted protein in the individual.
Agonists and antagonists of the polypeptides of the invention and methods for using these also are provided.