The invention relates generally to the fields of molecular biology, genomics, bioinformatics, pathology, and medicine. More particularly, the invention relates to a new utility of a gene whose expression is modulated in select cancers.
Recent efforts to sequence the entire human genome have resulted in the identification of tens of thousands of genes. See, e.g., Venter et al., Science, 291:1304-51, 2001. Despite this achievement, many of these identified genes have yet to be functionally characterized. As the function of these genes are elucidated they should prove to be useful for identifying new diagnostic and therapeutic targets for a variety of different diseases.
The invention relates to the discovery of specific polynucleotide sequences that are expressed at higher levels in select cancer cells than in non-diseased cells. The polynucleotide sequences were identified using a modified datamining tool referred to herein as DDDM (for Digital Differential Display tool, Modified) to analyze the Cancer Gene Anatomy Project (CGAP) database of the National Cancer Institute. In particular, DDDM was used to identify several expressed sequence tags (ESTs) more prevalent in cancer tissue libraries than in corresponding non-cancerous tissue libraries. The identified ESTs were than used to identify specific UniGenes associated with cancer. Based on the identified polynucleotide sequences, a gene termed SIM2 (for Single Minded homolog 2), whose expression is selectively upregulated in colon, prostate and pancreas tumors was identified.
The native human SIM2 gene has previously been cloned and sequenced. Chrast et al., Genome Res. 7: 615-624, 1997. Northern blot analyses indicated that several different species of mRNA are expressed from the SIM2 gene, including those of 2.7, 3, 4.4, and 6 kb. The multiple mRNAs are believed to be due to alternative splicing, overlapping transcription, or different utilization of 5xe2x80x2 or 3xe2x80x2 untranslated sequences. At least two different forms of the SIM2 gene have been characterized. The long form (GenBank ACC# U80456; SEQ ID NO: 1) is 3901 bp and codes for a protein of 667 amino acid with an apparent molecular weight of 74 kD. The short-form (GenBank ACC# U80457; SEQ ID NO: 2) is 2859 bp and codes for a protein of 570 amino acid with an apparent molecular weight of 64 kD. The N-termini of both the forms of SIM2 protein show extensive sequence identity to each other as well as to another member of the family, SIM1. The N-terminus of all of these proteins contains four recognized domains, namely, bHLH, PAS1, PAS2 and HST. These domains are often seen in transcription factors. The C-terminal ends of the proteins show some similarity, but also contain unique sequences.
SIM2 has previously been associated with Down""s Syndrome, but not cancer.
Accordingly, the invention features a method for detecting a cancer in a tissue sample. This method includes the steps of: (a) providing the tissue sample; and (b) analyzing the tissue sample for the presence of a SIM2 marker. The presence of the SIM2 marker in the tissue sample indicates that the tissue sample contains a cancer. In this method, the tissue sample can be a colon tissue sample, a prostate tissue sample, or a pancreas tissue sample.
SIM2 markers utilized within the invention can be, e.g., a SIM2 nucleic acid such as a SIM2 mRNA or a native SIM2 nucleic acid. The native SIM2 nucleic acid can have a nucleotide sequence SEQ ID NO: 1 or SEQ ID NO: 2. The SIM2 marker can also be a SIM2 protein such as a native SIM2 protein, e.g., one having an amino acid sequence of SEQ ID NO: 3 or SEQ ID NO: 4.
In the foregoing method, the step of providing a tissue sample can include obtaining the tissue sample from a human subject; and the step of analyzing the tissue sample can include isolating RNA from the tissue sample, generating cDNAs from the isolated RNA, amplifying the cDNAs by PCR to generate a PCR product, and electrophoretically separating the PCR product to yield an electrophoretic pattern. The step of amplifying the cDNAs by PCR can be performed using an oligonucleotide primer, e.g., one that includes a nucleotide sequence of SEQ ID NOs: 7, 8, 15, and 16. Also in this method, the step of amplifying the cDNAs by PCR can be performed using a first oligonucleotide primer and a second oligonucleotide primer. The first oligonucleotide primer can include the nucleotide sequence of SEQ ID NOs: 7 or 15. The second oligonucleotide primer can include the nucleotide sequence of SEQ ID NOs: 8 or 16. In a particular embodiment of this method, the presence of a 472 base pair nucleic acid in the electrophoretic pattern indicates that the tissue sample contains a cancer.
Also in the foregoing method, the step of analyzing the tissue sample for the SIM2 nucleic acid can include contacting the tissue sample with an oligonucleotide probe that hybridizes under stringent hybridization conditions to a polynucleotide having a nucleic acid sequence of SEQ ID NO: 1, SEQ ID NO: 2, the complement of SEQ ID NO: 1, or the complement of SEQ ID NO: 2. For example, the oligonucleotide probe can include the nucleic acid of SEQ ID NO: 9. The oligonucleotide probe of this method can also include a detectable label.
In a variation of the foregoing method, the SIM2 marker is a SIM2 protein such as a native SIM2 protein (e.g., one having an amino acid sequence of SEQ ID NO: 3 or SEQ ID NO: 4). In this variation, the step of providing a tissue sample can include obtaining the tissue sample from a human subject, and the step of analyzing the tissue sample can include contacting at least a portion of the tissue sample with a probe that specifically binds to the SIM2 protein. The probe can include a detectable label and/or an antibody (e.g., an antibody that specifically binds to the peptide of SEQ ID NO: 14). In another variation of the method, the tissue sample includes a cell isolated from feces, urine, or peripheral blood.
In another aspect, the invention features a method of modulating SIM2 gene expression. This method includes the steps of: (a) providing a cell that expresses a SIM2 gene; and (b) introducing into the cell an agent that modulates the expression the SIM2 gene in the cell. The agent can be an oligonucleotide such as an antisense oligonucleotide. For example, an antisense oligonucleotide that hybridizes under stringent hybridization conditions to a polynucleotide that encodes a SIM2 protein can be used, as can an antisense oligonucleotide that is at least 18 nucleotides in length and includes a sequence that is a complement of a nucleic acid that encodes the SIM2 protein. For instance, the antisense oligonucleotide can include a nucleic acid sequence of SEQ ID NOs: 11 or 12.
Also within the invention is a method of identifying a test compound that modulates expression of a SIM2 gene in a cell. This method includes the steps of: (a) providing a cell expressing a SIM2 gene; (b) contacting the cell with the test compound; and (c) detecting a modulation in the expression of the SIM2 gene. Detecting the modulation indicates that the test compound modulates expression of the SIM2 gene. In this method, the cell can be derived from a colon tissue sample, a prostate tissue sample, or a pancreas tissue sample. Also in this method, the step of detecting the modulation in the expression of the SIM2 gene can include analyzing the cell for a change in the intracellular concentration of a SIM2 marker.
The invention additionally features a method for reducing the growth rate of a cancer includes a cell expressing a SIM2 protein. This method includes the step of: contacting the cell with an agent that inhibits the expression of the SIM2 protein in the cell.
The agent can an oligonucleotide such as an antisense oligonucleotide. For example, an antisense oligonucleotide that hybridizes under stringent hybridization conditions to a polynucleotide that encodes a SIM2 protein can be used, as can an antisense oligonucleotide that is at least 18 nucleotides in length and includes a sequence that is a complement of a nucleic acid that encodes the SIM2 protein. For instance, the antisense oligonucleotide can include a nucleic acid sequence of SEQ ID NOs: 11 or 12.
In variations of this method, the cancer can be a colon cancer, a prostate cancer, or a pancreas cancer. The cancer can also be in an animal such as a mammal.
In still another aspect, the invention features a kit for modulating expression of a SIM2 gene in a cell. The kit can include: an agent that modulates the expression of the SIM2 gene in the cell and instructions for using the agent to modulate the expression of the SIM2 gene in the cell.
Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Commonly understood definitions of molecular biology terms can be found in Rieger et al., Glossary of Genetics: Classical and Molecular, 5th edition, Springer-Verlag: New York, 1991; and Lewin, Genes V, Oxford University Press: New York, 1994.
By the term xe2x80x9cgenexe2x80x9d is meant a nucleic acid molecule that codes for a particular protein, or in certain cases, a functional or structural RNA molecule. For example, the SIM2 gene encodes the SIM2 protein.
As used herein, a xe2x80x9cnucleic acidxe2x80x9d or a xe2x80x9cnucleic acid moleculexe2x80x9d means a chain of two or more nucleotides such as RNA (ribonucleic acid) and DNA (deoxyribonucleic acid). A xe2x80x9cpurifiedxe2x80x9d nucleic acid molecule is one that is substantially separated from other nucleic acid sequences in a cell or organism in which the nucleic acid naturally occurs (e.g., 30, 40, 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 100% free of contaminants). The term includes, e.g., a recombinant nucleic acid molecule incorporated into a vector, a plasmid, a virus, or a genome of a prokaryote or eukaryote. Examples of purified nucleic acids include cDNAs, fragments of genomic nucleic acids, nucleic acids produced polymerase chain reaction (PCR), nucleic acids formed by restriction enzyme treatment of genomic nucleic acids, recombinant nucleic acids, and chemically synthesized nucleic acid molecules. A xe2x80x9crecombinantxe2x80x9d nucleic acid molecule is one made by an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis or by the manipulation of isolated segments of nucleic acids by genetic engineering techniques.
By the terms xe2x80x9cSIM2 gene,xe2x80x9d xe2x80x9cSIM2 polynucleotide,xe2x80x9d or xe2x80x9cSIM2 nucleic acidxe2x80x9d is meant a native SIM2-encoding nucleic acid sequence, e.g., the native SIM2 gene; the native long form SIM2 cDNA (SEQ ID NO: 1); the native short form SIM2 cDNA (SEQ ID NO: 2); a nucleic acid having sequences from which a SIM2 cDNA can be transcribed; and/or allelic variants and homologs of the foregoing. The terms encompass double-stranded DNA, single-stranded DNA, and RNA.
As used herein, xe2x80x9cproteinxe2x80x9d or xe2x80x9cpolypeptidexe2x80x9d mean any peptide-linked chain of amino acids, regardless of length or post-translational modification, e.g., glycosylation or phosphorylation. A xe2x80x9cpurifiedxe2x80x9d polypeptide is one that is substantially separated from other polypeptides in a cell or organism in which the polypeptide naturally occurs (e.g., 30, 40, 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 100% free of contaminants).
By the terms xe2x80x9cSIM2 proteinxe2x80x9d or xe2x80x9cSIM2 polypeptidexe2x80x9d is meant an expression product of a SIM2 gene such as the native long form SIM2 protein (SEQ ID NO: 3), the native short form SIM2 protein (SEQ ID NO: 4), or a protein that shares at least 65% (but preferably 75, 80, 85, 90, 95, 96, 97, 98, or 99%) amino acid sequence identity with one of the foregoing and displays a functional activity of a native SIM2 protein. A xe2x80x9cfunctional activityxe2x80x9d of a protein is any activity associated with the physiological function of the protein. For example, functional activities of a native SIM2 protein may include DNA-binding activity and selective expression in certain neoplastic tissues.
When referring to a nucleic acid molecule or polypeptide, the term xe2x80x9cnativexe2x80x9d refers to a naturally-occurring (e.g., a xe2x80x9cwild-typexe2x80x9d) nucleic acid or polypeptide. A xe2x80x9chomologxe2x80x9d of a SIM2 gene is a gene sequence encoding a SIM2 polypeptide isolated from an organism other than a human being. Similarly, a xe2x80x9chomologxe2x80x9d of a native SIM2 polypeptide is an expression product of a SIM2 gene homolog.
As used herein, a xe2x80x9cSIM2 markerxe2x80x9d is any molecule whose presence in a sample (e.g., a cell) indicates that a SIM2 gene is expressed in the sample. SIM2 markers include SIM2 nucleic acids and SIM2 proteins. xe2x80x9cExpressing a SIM2 genexe2x80x9d or like phrases mean that a sample contains a transcription product (e.g., messenger RNA, i.e., xe2x80x9cmRNAxe2x80x9d) of a SIM2 gene or a translation product of a SIM2 protein-encoding nucleic acid (e.g., a SIM2 protein). A cell expresses a SIM2 gene when it contains a detectable level of a SIM2 nucleic acid or a SIM2 protein.
A xe2x80x9cfragmentxe2x80x9d of a SIM2 nucleic acid is a portion of a SIM2 nucleic acid that is less than full-length and comprises at least a minimum length capable of hybridizing specifically with a native SIM2 nucleic acid under stringent hybridization conditions. The length of such a fragment is preferably at least 15 nucleotides, more preferably at least 20 nucleotides, and most preferably at least 30 nucleotides of a native SIM2 nucleic acid sequence. A xe2x80x9cfragmentxe2x80x9d of a SIM2 polypeptide is a portion of a SIM2 polypeptide that is less than full-length (e.g., a polypeptide consisting of 5, 10, 15, 20, 30, 40, 50, 75, 100 or more amino acids of a native SIM2 protein), and preferably retains at least one functional activity of a native SIM2 protein.
When referring to hybridization of one nucleic acid to another, xe2x80x9clow stringency conditionsxe2x80x9d means in 10% formamide, 5xc3x97 Denhart""s solution, 6xc3x97 SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 1xc3x97 SSPE, 0.2% SDS, at 50xc2x0 C.; xe2x80x9cmoderate stringency conditionsxe2x80x9d means in 50% formamide, 5xc3x97 Denhart""s solution, 5xc3x97 SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 0.2xc3x97 SSPE, 0.2% SDS, at 65xc2x0 C.; and xe2x80x9chigh stringency conditionsxe2x80x9d means in 50% formamide, 5xc3x97 Denhart""s solution, 5xc3x97 SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 0.1xc3x97 SSPE, and 0.1% SDS at 65xc2x0 C. The phrase xe2x80x9cstringent hybridization conditionsxe2x80x9d means low, moderate, or high stringency conditions.
As used herein, xe2x80x9csequence identityxe2x80x9d means the percentage of identical subunits at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps and insertions. Sequence identity is present when a subunit position in both of the two sequences is occupied by the same nucleotide or amino acid, e.g., if a given position is occupied by an adenine in each of two DNA molecules, then the molecules are identical at that position. For example, if 7 positions in a sequence 10 nucleotides in length are identical to the corresponding positions in a second 10-nucleotide sequence, then the two sequences have 70% sequence identity. Sequence identity is typically measured using sequence analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705).
When referring to mutations in a nucleic acid molecule, xe2x80x9csilentxe2x80x9d changes are those that substitute of one or more base pairs in the nucleotide sequence, but do not change the amino acid sequence of the polypeptide encoded by the sequence. xe2x80x9cConservativexe2x80x9d changes are those in which at least one codon in the protein-coding region of the nucleic acid has been changed such that at least one amino acid of the polypeptide encoded by the nucleic acid sequence is substituted with a another amino acid having similar characteristics. Examples of conservative amino acid substitutions are ser for ala, thr, or cys; lys for arg; gln for asn, his, or lys; his for asn; glu for asp or lys; asn for his or gln; asp for glu; pro for gly; leu for ile, phe, met, or val; val for ile or leu; ile for leu, met, or val; arg for lys; met for phe; tyr for phe or trp; thr for ser; trp for tyr; and phe for tyr.
As used herein, the term xe2x80x9cvectorxe2x80x9d refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Preferred vectors are those capable of autonomous replication and/expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as xe2x80x9cexpression vectors.xe2x80x9d
A first nucleic-acid sequence is xe2x80x9coperablyxe2x80x9d linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked nucleic acid sequences are contiguous and, where necessary to join two protein coding regions, in reading frame.
A cell, tissue, or organism into which has been introduced a foreign nucleic acid, such as a recombinant vector, is considered xe2x80x9ctransformed,xe2x80x9d xe2x80x9ctransfected,xe2x80x9d or xe2x80x9ctransgenic.xe2x80x9d A xe2x80x9ctransgenicxe2x80x9d or xe2x80x9ctransformedxe2x80x9d cell or organism also includes progeny of the cell or organism, including progeny produced from a breeding program employing such a xe2x80x9ctransgenicxe2x80x9d cell or organism as a parent in a cross. For example, an organism transgenic for SIM2 is one in which SIM2 nucleic acid has been introduced.
By the term xe2x80x9cSIM2-specific antibodyxe2x80x9d is meant an antibody that binds a SIM2 protein and displays no substantial binding to other naturally occurring proteins other than those sharing the same antigenic determinants as the SIM2 protein. The term includes polyclonal and monoclonal antibodies as well as antibody fragments.
As used herein, xe2x80x9cbind,xe2x80x9d xe2x80x9cbinds,xe2x80x9d or xe2x80x9cinteracts withxe2x80x9d means that one molecule recognizes and adheres to a particular second molecule in a sample, but does not substantially recognize or adhere to other structurally unrelated molecules in the sample. Generally, a first molecule that xe2x80x9cspecifically bindsxe2x80x9d a second molecule has a binding affinity greater than about 105 to 106 moles/liter for that second molecule.
The term xe2x80x9clabeled,xe2x80x9d with regard to a probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or antibody.
Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In the case of conflict, the present specification, including definitions will control. The particular embodiments discussed below are illustrative only and not intended to be limiting.