The present invention relates to methods and compositions for the diagnosis, prevention, and treatment of neoplastic cell growth and proliferation, i.e., tumors and cancers (e.g., colon cancer) in mammals, for example, humans. Specifically, genes which are differentially expressed in tumor cells relative to normal cells are identified. Among these are certain novel genes.
Malignant tumors, i.e., cancers, are the second leading cause of death in the United States, after heart disease (Boring, et al., CA Cancer J. Clin., 43:7, 1993), and develop in one in three Americans. One of every four Americans dies of cancer. Cancer is characterized primarily by an increase in the number of abnormal, or neoplastic, cells derived from a normal tissue which proliferate to form a tumor mass, the invasion of adjacent tissues by these neoplastic tumor cells, and the generation of malignant cells which spread via the blood or lymphatic system to regional lymph nodes and to distant sites. The latter progression to malignancy is referred to as metastasis.
Cancer can result from a breakdown in the communication between neoplastic cells and their environment, including their normal neighboring cells. Signals, both growth-stimulatory and growth-inhibitory, are routinely exchanged between cells within a tissue. Normally, cells do not divide in the absence of stimulatory signals, and, likewise, will cease dividing in the presence of inhibitory signals. In a cancerous, or neoplastic, state, a cell acquires the ability to xe2x80x9coverridexe2x80x9d these signals and to proliferate under conditions in which normal cells would not grow.
Tumor cells must acquire a number of distinct aberrant traits to proliferate. Reflecting this requirement is the fact that the genomes of certain well-studied tumors carry several different independently altered genes, including activated oncogenes and inactivated tumor suppressor genes. Each of these genetic changes appears to be responsible for imparting some of the traits that, in aggregate, represent the full neoplastic phenotype (Land et al., Science, 222:771, 1983; Ruley, Nature, 304:602, 1983; Hunter, Cell, 64:249, 1991).
Differential expression of the following suppressor genes has been demonstrated in human cancers: a retinoblastoma gene, RB; the Wilms"" tumor gene, WT1 (11p); a gene deleted in colon carcinoma, DCC (18q); the neurofibromatosis type 1 gene, NF1 (17q); and a gene involved in familial adenomatous polyposis coli, APC (5q) (Vogelstein, B. and Kinzler, K. W., Trends Genet., 9:138-141, 1993).
The present invention relates to methods and compositions for the diagnosis, prevention, and treatment of tumors and cancers, e.g., colon or lung cancer, in mammals, e.g., humans. The invention is based on the discovery of genes that are differentially expressed in tumor cells relative to normal cells of the same tissue. The genes identified can be used diagnostically or as targets for therapy, and can be used to identify compounds useful in the diagnosis, prevention, and therapy of tumors and cancers (e.g., colon cancer). The genes also can be used in gene therapy, protein synthesis, and to develop antisense nucleic acids.
In general, the invention features an isolated nucleic acid including the nucleotide sequence of any one of SEQ ID NOS: 1, 3 to 7, 9 to 13, 16, 17, or 19 to 23, or an isolated nucleic acid that hybridizes under stringent hybridization conditions to one of these nucleic acids or their complements. The invention also features a genetically engineered host cell containing one of these nucleotide sequences, and an expression vector containing one of these nucleotide sequences operably linked to a nucleotide sequence regulatory element that controls expression of the nucleotide sequence in a host cell.
The invention further features a substantially pure gene product encoded by one of these nucleic acids, e.g., having the amino acid sequence of SEQ ID NO:18. The invention also features an antibody that immunospecifically binds to this gene product.
In another embodiment, the invention features a method of diagnosing a tumor in a mammal by obtaining a test sample of tissue cells, e.g., colon cells, from the mammal; obtaining a control sample of known normal cells from the same type of tissue; and detecting in both the test sample and the control sample the level of expression of any one or more of genes 048, 083, 090, 093, and 097, wherein a level of expression higher in the test sample than in the control sample indicates a tumor in the test sample.
The method of diagnosing a tumor can also be carried out using any one or more of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, or 101, wherein a level of expression lower in the test sample than in the control sample indicates a tumor in the test sample.
The invention further features a method of treating a tumor, e.g., a colon tumor, in a patient, e.g., a mammal such as a human, by administering to the mammal a compound in an amount effective to decrease the level of expression or activity of the gene transcript or gene product of any one or more of genes 048, 083, 090, 093, and 097, to a level effective to treat the tumor.
In this method, the compound can be an antisense or ribozyme molecule that blocks translation of the gene transcript, or a nucleic acid complementary to the 5xe2x80x2 region of any one or more of genes 048, 083, 090, 093, and 097, and blocks formation of a gene transcript via triple helix formation. The compound also can be an antibody that neutralizes the activity of the gene product.
In another method of treating a tumor in a mammal, a compound is administered in an amount effective to increase the level of expression or activity of the gene transcript or gene product of any one or more of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, and 101, to a level effective to treat the tumor, e.g., colon tumor. In this method, the compound can be a nucleic acid whose administration results in an increase in the level of expression of any one of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, and 101, thereby ameliorating symptoms of the tumor.
In another aspect, the invention features a method for inhibiting tumors in a mammal by administering to the mammal a normal allele of one or more of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, and 101, so that a gene product is expressed, thereby inhibiting tumors. The invention also covers a method for treating tumors in a mammal by administering to the mammal an effective amount of a gene product of any one or more of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, and 101.
The invention also features a method of monitoring the efficacy of a compound in clinical trials for inhibition of tumors, e.g., colon tumors, in a patient by obtaining a first sample of tumor tissue cells from the patient; administering the compound to the patient; after a time sufficient for the compound to inhibit the tumor, obtaining a second sample of tumor tissue cells from the patient; and detecting in the first and second samples the level of expression of any one or more of genes 048, 083, 090, 093, and 097, wherein a level of expression lower in the second sample than in the first sample indicates that the compound is effective to inhibit a tumor in the patient.
This method can also be carried out using any one or more of genes 029, 030, 036, 038, 056, 075, 082, 092, 096, or 101, wherein a level of expression higher in the second sample than in the first sample indicates that the compound is effective to inhibit a tumor in the patient.
A xe2x80x9ctumor,xe2x80x9d as used herein, refers to all neoplastic cell growth and proliferation, whether malignant or benign, and all precancerous and cancerous cells and tissues.
A xe2x80x9cdifferentially expressedxe2x80x9d gene transcript, as used herein, refers to a gene transcript that is found in different numbers of copies, or in activated versus inactivated states, in different cell or tissue types of an organism having a tumor or cancer, e.g., colon cancer, compared to the numbers of copies or state of the gene transcript found in the cells of the same tissue in a healthy organism, or in the cells of the same tissue in the same organism. Multiple copies of gene transcripts may be found in an organism having the tumor or cancer, while only one, or significantly fewer copies, of the same gene transcript are found in a healthy organism or healthy cells of the same tissue in the same organism, or vice-versa.
As used herein, a xe2x80x9cdifferentially expressed genexe2x80x9d refers to (a) a gene containing: at least one of the DNA sequences disclosed herein (as shown in FIGS. 1a to 1p and 2 to 7); (b) any DNA sequence that encodes the amino acid sequences encoded by the DNA sequences disclosed in FIGS. 1a to 1p and 2 to 7; or (c) any DNA sequence that hybridizes to the complement of the sequences disclosed in FIGS. 1a to 1p and 2 to 7 under highly stringent conditions, i.e., hybridization to filter-bound DNA in 0.5 M NaHPO4, 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65xc2x0 C., and washing in 0.1xc3x97SSC/0.1% SDS at 68xc2x0 C. (Ausubel F. M. et al., eds., 1989, Current Protocols in Molecular Bioloqy, Vol. I, Green Publishing Associates, Inc., and John Wiley and sons, Inc., New York, at p. 2.10.3); or under moderately stringent conditions, i.e., washing in 0.2xc3x97SSC/0.1% SDS at 42xc2x0 C. (Ausubel et al., 1989, supra), yet which still encodes a gene product functionally equivalent to a gene product encoded by a gene of (a) above.
The initial cDNA sequences discovered by the paradigms described below (and shown in FIGS. 1a to 1p) are used to obtain additional cDNA sequences of various lengths up to the full-length cDNA sequences corresponding to individual genes (see FIGS. 2 to 7). The individual genes are referred to by a three digit number, e.g., 029, based on the number of the first DNA sequence found that corresponds to that particular gene. In some instances, the paradigm generated two or more DNA sequences that correspond to overlapping or completely unique portions of the full-length cDNA of a gene. In those instances, the gene is referred to by the number of the first DNA sequence found to correspond to that gene, followed by one or more numbers in parentheses that correspond to the numbers of later sequences that correspond to the same gene.
A xe2x80x9cdifferentially expressed gene,xe2x80x9d can be a target, fingerprint, or pathway gene. For example, a xe2x80x9cfingerprint gene,xe2x80x9d as used herein, refers to a differentially expressed gene whose expression pattern can be used as a prognostic or diagnostic marker for the evaluation of tumors and cancers, or which can be used to identify compounds useful for the treatment of tumors and cancers, e.g., colon or lung cancer. For example, the effect of a compound on the fingerprint gene expression pattern normally displayed in connection with tumors and cancers can be used to evaluate the efficacy of the compound as a tumor and cancer treatment, or can be used to monitor patients undergoing clinical evaluation for the treatment of tumors and cancer.
A xe2x80x9cfingerprint pattern,xe2x80x9d as used herein, refers to a pattern generated when the expression pattern of a series (which can range from two up to all the fingerprint genes that exist for a given state) of fingerprint genes is determined. A fingerprint pattern can be used in the same diagnostic, prognostic, and compound identification methods as the expression of a single fingerprint gene.
A xe2x80x9ctarget gene,xe2x80x9d as used herein, refers to a differentially expressed gene in which modulation of the level of gene expression or of gene product activity prevents and/or ameliorates tumor and cancer, e.g., colon cancer, symptoms. Thus, compounds that modulate the expression of a target gene or the activity of a target gene product can be used in the treatment or prevention of tumors and cancers.
xe2x80x9cPathway genes,xe2x80x9d as used herein, are genes that encode proteins or polypeptides that interact with other gene products involved in tumors and cancers. Pathway genes can also exhibit target gene and/or fingerprint gene characteristics.
By xe2x80x9csubstantially identicalxe2x80x9d is meant a polypeptide or nucleic acid having a sequence that has at least 85%, preferably 90%, and more preferably 95%, 98%, 99% or more identity to the sequence of a reference nucleic acid sequence, e.g., the nucleic acid sequence of SEQ ID NO:23.
The nucleic acid molecules of the invention can be inserted into transcription and/or translation vectors, as described below, which will facilitate expression of the insert. The nucleic acid molecules and the polypeptides they encode can be used directly as diagnostic or therapeutic agents, or (in the case of a polypeptide) can be used to generate antibodies that, in turn, are therapeutically useful. Accordingly, expression vectors containing the nucleic acid molecules of the invention, cells transfected with these vectors, the polypeptides expressed, and antibodies generated, against either the entire polypeptide or an antigenic fragment thereof, are among the preferred embodiments.
As used herein, the term xe2x80x9ctransfected cellxe2x80x9d means any cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a nucleic acid encoding a polypeptide of the invention.
By xe2x80x9cisolated nucleic acid moleculexe2x80x9d is meant a nucleic acid molecule that is separated from the 5xe2x80x2 and 3xe2x80x2 coding sequences with which it is immediately contiguous in the naturally occurring genome of an organism. Thus, the term xe2x80x9cisolated nucleic acid moleculexe2x80x9d includes nucleic acid molecule which are not naturally occurring, e.g., nucleic acid molecules created by recombinant DNA techniques.
The term xe2x80x9cnucleic acid moleculexe2x80x9d encompasses both RNA and DNA, including cDNA, genomic DNA, and synthetic (e.g., chemically synthesized) DNA. Where single-stranded, the nucleic acid may be a sense strand or an antisense strand.
The polypeptides of the invention can also be chemically synthesized, or they can be purified from tissues in which they are naturally expressed, according to standard biochemical methods of purification.
Also included in the invention are xe2x80x9cfunctional polypeptides,xe2x80x9d which possess one or more of the biological functions or activities of a protein or polypeptide of the invention. These functions or activities include the ability to bind some or all of the proteins which normally bind to gene 036 protein.
The functional polypeptides may contain a primary amino acid sequence that has been modified from those disclosed herein. Preferably these modifications consist of conservative amino acid substitutions, as described herein.
The terms xe2x80x9cproteinxe2x80x9d and xe2x80x9cpolypeptidexe2x80x9d are used herein to describe any chain of amino acids, regardless of length or post-translational modification (for example, glycosylation or phosphorylation). Thus, the term xe2x80x9cpolypeptidexe2x80x9d includes full-length, naturally occurring proteins as well as recombinantly or synthetically produced polypeptides that correspond to a full-length naturally occurring protein or to particular domains or portions of a naturally occurring protein. The term also encompasses mature proteins which have an added amino-terminal methionine to facilitate expression in prokaryotic cells).
The term xe2x80x9cpurifiedxe2x80x9d as used herein refers to a nucleic acid or peptide that is substantially free of cellular material, viral material, or culture medium when produced by recombinant DNA techniques, or chemical precursors or other chemicals when chemically synthesized.
Polypeptides or other compounds of interest are said to be xe2x80x9csubstantially purexe2x80x9d when they are within preparations that are at least 60% by weight (dry weight) the compound of interest. Preferably, the preparation is at least 75%, more preferably at least 90%, and most preferably at least 99%, by weight the compound of interest. Purity can be measured by any appropriate standard method, for example, by column chromatography, polyacrylamide gel electrophoresis, or HPLC analysis.
A polypeptide or nucleic acid molecule is xe2x80x9csubstantially identicalxe2x80x9d to a reference polypeptide or nucleic acid molecule if it has a sequence that is at least 85%, preferably at least 90%, and more preferably at least 95%, 98%, or 99% identical to the sequence of the reference polypeptide or nucleic acid molecule.
Where a particular polypeptide is said to have a specific percent identity to a reference polypeptide of a defined length, the percent identity is relative to the reference peptide. Thus, a peptide that is 50% identical to a reference polypeptide that is 100 amino acids long can be a 50 amino acid polypeptide that is completely identical to a 50 amino acid long portion of the reference polypeptide. It might also be a 100 amino acid long polypeptide which is 50% identical to the reference polypeptide over its entire length. of course, many other polypeptides will meet the same criteria.
In the case of polypeptide sequences which are less than 100% identical to a reference sequence, the non-identical positions are preferably, but not necessarily, conservative substitutions for the reference sequence. Conservative substitutions typically include substitutions within the following groups: glycine and alanine; valine, isoleucine, and leucine; aspartic acid and glutamic acid; asparagine and glutamine; serine and threonine; lysine and arginine; and phenylalanine and tyrosine.
For polypeptides, the length of the reference polypeptide sequence will generally be at least 16 amino acids, preferably at least 20 amino acids, more preferably at least 25 amino acids, and most preferably 35 amino acids, 50 amino acids, or 100 amino acids. For nucleic acids, the length of the reference nucleic acid sequence will generally be at least 50 nucleotides, preferably at least 60 nucleotides, more preferably at least 75 nucleotides, and most preferably 100 nucleotides or 300 nucleotides.
Sequence identity can be measured using sequence analysis software (for example, the Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705), with the default parameters as specified therein.
The nucleic acid molecules of the invention can be inserted into a vector, as described below, which will facilitate expression of the insert. The nucleic acid molecules and the polypeptides they encode can be used directly as diagnostic or therapeutic agents, or can be used (directly in the case of the polypeptide or indirectly in the case of a nucleic acid molecule) to generate antibodies that, in turn, are clinically useful as a therapeutic or diagnostic agent. Accordingly, vectors containing the nucleic acid of the invention, cells transfected with these vectors, the polypeptides expressed, and antibodies generated, against either the entire polypeptide or an antigenic fragment thereof, are among the preferred embodiments.
As used herein, the term xe2x80x9ctransformed cellxe2x80x9d means a cell into which (or into an ancestor of which) has been introduced, by means of recombinant DNA techniques, a nucleic acid molecule encoding a polypeptide of the invention.
The invention also features antibodies, e.g., monoclonal, polyclonal, and engineered antibodies, which specifically bind proteins and polypeptides of the invention, e.g., gene 036 protein. By xe2x80x9cspecifically bindsxe2x80x9d is meant an antibody that recognizes and binds to a particular antigen, e.g., a gene 036 polypeptide of the invention, but which does not substantially recognize or bind to other molecules in a sample, e.g., a biological sample.
The invention also features antagonists and agonists of gene 036 protein that can inhibit or enhance one or more of the functions or activities of gene 036 protein or other proteins of the invention, respectively. Suitable antagonists can include small molecules (i.e., molecules with a molecular weight below about 500), large molecules (i.e., molecules with a molecular weight above about 500), antibodies that bind and xe2x80x9cneutralizexe2x80x9d gene 036 protein (as described below), polypeptides which compete with a native form of gene 036 protein for binding to a protein which naturally interacts with gene 036 protein, and nucleic acid molecules that interfere with transcription of a gene of the invention (for example, antisense nucleic acid molecules and ribozymes). Useful agonists also include small and large molecules, and antibodies other than xe2x80x9cneutralizingxe2x80x9d antibodies.
The invention also features molecules which can increase or decrease the expression of a gene of the invention (e.g., by influencing transcription or translation). Small molecules (i.e., molecules with a molecular weight below about 500), large molecules (i.e., molecules with a molecular weight above about 500), and nucleic acid molecules that can be used to inhibit the expression of a gene of the invention for example, antisense and ribozyme molecules) or to enhance their expression (for example, expression constructs that place nucleic acid sequences encoding proteins of the invention, e.g., gene 036 protein under the control of a strong promoter system), and transgenic animals that express a gene 036 transgene.
The invention also includes nucleic acid molecules, preferably DNA, that hybridize to the DNA sequences (a) through (c), above, of a differentially expressed gene. Hybridization conditions can be highly stringent or moderately stringent, as described above. In instances wherein the nucleic acid molecules are deoxyoligonucleotides (xe2x80x9coligosxe2x80x9d), highly stringent conditions are defined as washing in 6xc3x97SSC/0.05% sodium pyrophosphate at 37xc2x0 C. (for 14-base oligos), 48xc2x0 C. (for 17-base oligos), 55xc2x0 C. (for 20-base oligos), and 60xc2x0 C. (for 23-base oligos). These nucleic acid molecules can act as target gene antisense molecules, useful in target gene regulation, or as antisense primers in amplification reactions of target, fingerprint, and/or pathway gene nucleic acid sequences. Further, such sequences can be used as part of ribozyme and/or triple helix sequences, also useful for target gene regulation. Still further, such molecules can be used in diagnostic methods to detect tumors and cancers, e.g., colon cancer, and a patient""s predisposition towards tumors or cancers.
The invention also encompasses (a) DNA vectors that contain any of the foregoing coding sequences and/or their complements (i.e., antisense); (b) DNA expression vectors that contain any of the foregoing coding sequences operatively associated with a regulatory element that directs the expression of the coding sequences; and (c) genetically engineered host cells that contain any of the foregoing coding sequences operatively associated with a regulatory element that directs the expression of the coding sequences in the host cell. As used herein, xe2x80x9cregulatory elementsxe2x80x9d include, but are not limited to, inducible and non-inducible promoters, enhancers, operators, and other elements known to those skilled in the art that drive and regulate expression. The invention includes fragments of any of the DNA sequences disclosed herein.
A xe2x80x9cdetectablexe2x80x9d RNA expression level, as used herein, means a level that is detectable by the standard techniques of differential display, RT (reverse transcriptase)-coupled polymerase chain reaction (PCR), Northern, and/or RNase protection analyses. The degree to which expression differs need only be large enough to be visualized via standard characterization techniques, such as, for example, the differential display technique described below.
Based on the expression patterns in the paradigm results described below (e.g., Table 1), the following genes 029, 030, 036 (095), 038 (102), 056, 075, 082, 092, 096 (105), and 101 are expressed at a higher level in normal colon tissues than in cancerous colon tissues. Specifically, the data show a correlation between an increase in the expression level of these genes and a decrease in a colon cell""s tumor potential. In other words, a reduction of the expression level of these genes in a cell may induce or predispose the cell to become cancerous. Hence, methods that increase the level of expression of these genes may inhibit or slow the progression to tumors and cancers, e.g., colon cancer.
On the other hand, further based on the expression patterns in the paradigm results described below (e.g., Table 1), the following genes 048, 083, 090, 093, and 097 are expressed at a higher level in colon tumor tissues than in no rmal colon tissues. Specifically, the data show a correlation between an increase in the expression level of these genes and an increase in a colon cell""s cancer potential. In other words, a reduction of the expression level of these genes in a cell may induce or predispose that cell to remain normal. Hence, methods that decrease the level of expression of these genes may inhibit or slow the progression to tumors and cancers, e.g., colon cancer.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.
FIGS. 1a to 1p are a series of DNA sequence fragments (SEQ ID NOs:1 to 16) from genes detected by the paradigms described herein.
FIG. 2 is a DNA sequence (SEQ ID NO:17) from gene 082 and the amino acid sequence (SEQ ID NO:18) encoded by gene 082.
FIG. 3 is a DNA sequence (SEQ ID NO:19) from gene 048.
FIG. 4 is a DNA sequence (SEQ ID NO:20) from gene 090.
FIG. 5 is a DNA sequence (SEQ ID NO:21) from gene 093.
FIG. 6 is a DNA sequence (SEQ ID NO:22) from gene 101.
FIG. 7 is the DNA (SEQ ID NO:23) of a gene 036 cDNA and the amino acid sequence (SEQ ID NO:24) encoded by the gene 036 cDNA.