Not applicable.
The invention relates generally to the fields of molecular biology, genomics, bioinformatics, pathology, and medicine. More particularly, the invention relates to a gene whose expression is modulated in select cancers.
With the recent efforts to sequence the entire human genome, the nucleotide sequences of more than 100,000 human genes are expected to be known within the next few years. See, e.g., Robbins, R. J., J. Computat. Biol., 3: 465-478, 1996; Andrade, M. A. and Sander, C., Curr. Opin. Biotechnol., 8: 675-683, 1997; and Collins et al., Science, 282: 682-689, 1998. Once characterized, these genes are anticipated to be useful for identifying new diagnostic and therapeutic targets for a variety of different diseases. Fannon, M. R., Trends Biotechnol., 14: 294-298, 1996. Already several attempts have been made to identify genes or gene products that are uniquely expressed in diseased tissue. The results of these efforts indicated that pathology correlates more often with the pattern of gene expression in the diseased tissue, rather than simply with the absence or presence of a particular gene.
The invention relates to the discovery of specific polynucleotide sequences that are upregulated in select cancer cells as compared to non-diseased cells. In particular, several expressed sequence tags (ESTs) more prevalent in cancer tissue libraries than in corresponding non-cancerous tissue libraries were identified. These ESTs were then used to identify specific UniGene clusters associated with cancer. See, Schuler, J. Mol. Med. 75(10), 694-698, 1998; Schuler et al., Science 274, 540-546, 1996; and Boguski and Schuler, Nature Genetics 10, 369-371, 1995. Based on the identified polynucleotide sequences, a partial gene sequence termed C4, whose expression is selectively upregulated in colon tumors was identified. Using this partial sequence, a full length gene, termed CCRG (Colon Carcinoma Related Gene) containing the C4 sequence was isolated and sequenced.
An open reading frame of the CCRG gene encodes a polypeptide, i.e., the CCRG protein, which was predicted to have a signal peptide sequence, and putative phosphorylation, myristylation, and glycosylation sites. Based on comparisons to sequences of known function, the nucleotide sequence of CCRG (and C4) was predicted to encode a prokaryotic lipoprotein binding site and a prenylation site. The C-terminus of the CCRG protein is cysteine rich and contains a motif found in ultra high sulphur matrix protein, hair keratin, metallothionein and cation transporters. Using the secondary structure prediction program provided by the ExPASy proteomics server by the Swiss Institute of Bioinformatics (Geneva), CCRG protein was predicted to contain mostly a mixture of alpha helices, beta strands, and coils. The mature CCRG protein has a theoretical molecular weight of 8.62 kDa and a pI of 8.05. These and other analyses indicated that CCRG protein is a colon tumor associated secreted factor.
Accordingly, the invention features a purified nucleic acid present at higher levels in colon cancer cells than in non-cancerous colon cells and includes a nucleotide sequence that encodes a polypeptide sharing at least 80% sequence identity with SEQ ID NO:7 or with a fragment of SEQ ID NO:7 at least 20 residues in length. The nucleotide sequence can be one that defines a polynucleotide whose complement hybridizes under high stringency conditions to the nucleotide sequence of SEQ ID NO:6. The polypeptide encoded by the nucleic acid can have an amino acid sequence consisting of SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length. The nucleic acid can include a fragment of the polynucleotide sequence of SEQ ID NO:6 at least 50 residues long (e.g., one including the polynucleotide sequence of SEQ ID NO:6).
Also within the invention is a vector including a purified nucleic acid present at higher levels in colon cancer cells than in non-cancerous colon cells, the purified nucleic acid including a nucleotide sequence that encodes a polypeptide sharing at least 80% sequence identity with SEQ ID NO:7 or with a fragment of SEQ ID NO:7 at least 20 residues in length. The nucleic acid contained within this vector can be operably linked to one or more expression control sequences. In another aspect, the invention features a cell including a vector of the invention. including a purified nucleic acid present at higher levels in colon cancer cells than in non-cancerous colon cells.
The invention also provides a probe including an oligonucleotide and a detectable label attached to the oligonucleotide, the oligonucleotide being at least 15 nucleotides in length and hybridizing under high stringency conditions to the nucleotide sequence of SEQ ID NO:7 or a complement of the nucleotide sequence of SEQ ID NO:7.
A kit for detecting a purified nucleic acid including a nucleotide sequence that encodes a polypeptide sharing at least 80% sequence identity with SEQ ID NO:7 or with a fragment of SEQ ID NO:7 at least 20 residues in length in a cell is also within the invention. The kit includes: a first PCR primer including a first nucleic acid molecule including the nucleotide sequence of SEQ ID NO:2 or SEQ ID NO:9, and a second PCR primer including a second nucleic acid molecule including the nucleotide sequence of SEQ ID NO:3 or SEQ ID NO:10.
The invention also features a purified polypeptide expressed at higher levels by colon cancer cells than by non-cancerous colon cells. The purified polypeptide includes an amino acid sequence that shares at least 80% sequence identity with SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length, e.g., one including a fragment of SEQ ID NO:7 at least 20 residues in length or one including residues 31-111 of the amino acid sequence of SEQ ID NO:7. The purified polypeptide can also include the amino acid sequence of SEQ ID NO:7.
A purified antibody that specifically binds to a polypeptide including an amino acid sequence that shares at least 80% sequence identity with SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length is featured in the invention. This antibody can include a detectable label.
In further aspect, the invention provides a method of producing a CCRG polypeptide. This method includes the steps of: (a) providing a cell transformed with a purified nucleic acid including a nucleotide sequence that encodes a CCRG polypeptide sharing at least 80% sequence identity with SEQ ID NO:7; (b) culturing the cell under conditions that allow expression of the CCRG polypeptide; and (c) collecting the CCRG polypeptide from the cultured cell.
A screening method for identifying a substance that modulates expression of a gene encoding a CCRG polypeptide sharing at least 80% sequence identity with SEQ ID NO:7 is also within the invention. This method includes the steps of: (a) providing a test cell that includes the gene encoding a CCRG polypeptide sharing at least 80% sequence identity with SEQ ID NO:7; (b) contacting the test cell with a candidate substance; and (c) detecting an increase or decrease in the expression level of the gene encoding the CCRG polypeptide in the presence of the candidate substance, compared to the expression level of the gene encoding CCRG polypeptide in the absence of the candidate substance, as an indication that the candidate substance modulates the level of expression of the gene encoding the CCRG polypeptide.
In addition, the invention provides a method for isolating a substance that binds a CCRG polypeptide sharing at least 80% sequence identity with SEQ ID NO:7. This method includes the steps of: (a) providing a sample of the CCRG polypeptide immobilized on a substrate;(b) contacting a mixture containing the CCRG polypeptide-binding substance with the immobilized CCRG polypeptide; (c) separating unbound components of the mixture from bound components of the mixture; and (d) recovering the CCRG polypeptide-binding substance from the immobilized CCRG polypeptide.
A method for detecting the presence of a CCRG nucleic acid or polypeptide in a biological sample is also included within the invention. This method includes the steps of: (a) providing the biological sample; and (b) detecting the presence of the CCRG nucleic acid or polypeptide in the biological sample. In one variation of this method, step (b) of detecting the presence of the CCRG nucleic acid or polypeptide in a biological sample includes: contacting the biological sample with a probe that binds to the CCRG nucleic acid or polypeptide; and detecting binding of the probe to the biological sample. In another variation of this method, step (b) of detecting the presence of the CCRG nucleic acid or polypeptide in a biological sample includes: isolating RNA from the biological sample; generating cDNAs from the isolated RNA; contacting the cDNAs with a first PCR primer that hybridizes to a first portion of a polynucleotide sharing at least 80% sequence identity with SEQ ID NO:6 or a complement of SEQ ID NO:6, and a second PCR primer that hybridizes to a second portion of a polynucleotide sharing at least 80% sequence identity with SEQ ID NO:6 or a complement of SEQ ID NO:6 to form a mixture; subjecting the mixture to reverse transcriptase-polymerase chain reaction to generate PCR amplification products; and analyzing the PCR amplification products by gel electrophoresis.
Also within the invention is a method for detecting the presence of a colon cancer cell in a biological sample. This method includes the steps of: (a) providing the biological sample; and (b) analyzing the biological sample for the presence of a molecule selected from the group consisting of: a nucleic acid at least 15 nucleotides in length that hybridizes under stringent conditions to the nucleic acid of SEQ ID NO:6 or the complement of SEQ ID NO:6, and a polypeptide sharing at least 80% sequence identity with SEQ ID NO:7. Presence of the molecule in the biological sample indicates that the sample contains a colon cancer cell.
The invention also provides a method for detecting the presence of a CCRG protein in a biological sample. This method includes the steps of: (a) providing the biological sample; and (b) analyzing the biological sample for the presence of a polypeptide including an amino acid sequence that shares at least 80% sequence identity with SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length. Presence of the polypeptide in the biological sample indicates that the sample contains the CCRG protein. In one variation of this method, the step (b) of analyzing the biological sample for the presence of a polypeptide including an amino acid sequence that shares at least 80% sequence identity with SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length includes contacting the biological sample with an antibody that specifically binds to a polypeptide including an amino acid sequence that shares at least 80% sequence identity with SEQ ID NO:7 or a fragment of SEQ ID NO:7 at least 20 residues in length.
In the foregoing methods, the biological sample can be a cell derived from a colon (e.g., a human colon), feces, urine, blood, plasma, or serum.
Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Commonly understood definitions of molecular biology terms can be found in Rieger et al., Glossary of Genetics: Classical and Molecular, 5th edition, Springer-Verlag: New York, 1991; and Lewin, Genes V, Oxford University Press: New York, 1994.
By the term xe2x80x9cgenexe2x80x9d is meant a nucleic acid molecule that codes for a particular protein, or in certain cases, a functional or structural RNA molecule. For example, the CCRG gene encodes the CCRG protein.
As used herein, a xe2x80x9cnucleic acidxe2x80x9d or a xe2x80x9cnucleic acid moleculexe2x80x9d means a chain of two or more nucleotides such as RNA (ribonucleic acid) and DNA (deoxyribonucleic acid). A xe2x80x9cpurifiedxe2x80x9d nucleic acid molecule is one that has been substantially separated or isolated away from other nucleic acid sequences in a cell or organism in which the nucleic acid naturally occurs (e.g., 30, 40, 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 100% free of contaminants). The term includes, e.g., a recombinant nucleic acid molecule incorporated into a vector, a plasmid, a virus, or a genome of a prokaryote or eukaryote. Examples of purified nucleic acids include cDNAs, fragments of genomic nucleic acids, nucleic acids produced polymerase chain reaction (PCR), nucleic acids formed by restriction enzyme treatment of genomic nucleic acids, recombinant nucleic acids, and chemically synthesized nucleic acid molecules. A xe2x80x9crecombinantxe2x80x9d nucleic acid molecule is one made by an artificial combination of two otherwise separated segments of sequence, e.g., by chemical synthesis or by the manipulation of isolated segments of nucleic acids by genetic engineering techniques.
By the terms xe2x80x9cCCRG gene,xe2x80x9d xe2x80x9cCCRG polynucleotide,xe2x80x9d or xe2x80x9cCCRG nucleic acidxe2x80x9d is meant a native CCRG-encoding nucleic acid sequence, e.g., the native CCRG cDNA (as shown in FIG. 6); a nucleic acid having sequences from which CCRG cDNA can be transcribed; and/or allelic variants and homologs of the foregoing. The terms encompass double-stranded DNA, single-stranded DNA, and RNA.
As used herein, xe2x80x9cproteinxe2x80x9d or xe2x80x9cpolypeptidexe2x80x9d are used synonymously to mean any peptide-linked chain of amino acids, regardless of length or post-translational modification, e.g., glycosylation or phosphorylation. An xe2x80x9cpurifiedxe2x80x9d polypeptide is one that has been substantially separated or isolated away from other polypeptides in a cell or organism in which the polypeptide naturally occurs (e.g., 30, 40, 50, 60, 70, 80, 90, 95, 96, 97, 98, 99, 100% free of contaminants).
By the terms xe2x80x9cCCRG proteinxe2x80x9d or xe2x80x9cCCRG polypeptidexe2x80x9d is meant an expression product of an CCRG gene such as the native CCRG protein of FIG. 7 (SEQ ID NO:7) or FIG. 8 (amino acid residues 31-11 of SEQ ID NO:7) or a protein that shares at least 65% (but preferably 75, 80, 85, 90, 95, 96, 97, 98, or 99%) amino acid sequence identity with the protein of FIG. 7 or FIG. 8 and displays a functional activity of CCRG. A xe2x80x9cfunctional activityxe2x80x9d of a protein is any activity associated with the physiological function of the protein. For example, functional activities of CCRG may include selective expression in certain neoplastic tissues. In addition, the expression of CCRG in the small intestine suggests that it may be an autocrine secreted growth factor in the intestine and that its overexpression in the large intestine (colon) may contribute to tumor formation.
When referring to a nucleic acid molecule or polypeptide, the term xe2x80x9cnativexe2x80x9d refers to a naturally-occurring (e.g., a xe2x80x9cwild-typexe2x80x9d) nucleic acid or polypeptide. A xe2x80x9chomologxe2x80x9d of a CCRG gene is a gene sequence encoding a CCRG polypeptide isolated from an organism other than a human being. Similarly, a xe2x80x9chomologxe2x80x9d of a native CCRG polypeptide is an expression product of a CCRG homolog.
A xe2x80x9cfragmentxe2x80x9d of a CCRG nucleic acid is a portion of a CCRG nucleic acid that is less than full-length and comprises at least a minimum length capable of hybridizing specifically with a native CCRG nucleic acid under stringent hybridization conditions. The length of such a fragment is preferably at least 15 nucleotides, more preferably at least 20 nucleotides, and most preferably at least 30 nucleotides of a native CCRG nucleic acid sequence. A xe2x80x9cfragmentxe2x80x9d of a CCRG polypeptide is a portion of a CCRG polypeptide that is less than full-length (e.g., a polypeptide consisting of 5, 10, 15, 20, 30, 40, 50, 75, 100 or more amino acids of native CCRG polypeptide), and preferably retains at least one functional activity of native CCRG polypeptide. For example, a polypeptide consisting of amino acids 31-111 of the native CCRG polypeptide (i.e., the polypeptide of SEQ ID NO:7 without the signal peptide) is a fragment of the full length native CCRG polypeptide.
When referring to hybridization of one nucleic to another, xe2x80x9clow stringency conditionsxe2x80x9d means in 10% formamide, 5xc3x97Denhart""s solution, 6xc3x97SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 1xc3x97SSPE, 0.2% SDS, at 50xc2x0 C.; xe2x80x9cmoderate stringency conditionsxe2x80x9d means in 50% formamide, 5xc3x97Denhart""s solution, 5xc3x97SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 0.2xc3x97SSPE, 0.2% SDS, at 65xc2x0 C.; and xe2x80x9chigh stringency conditionsxe2x80x9d means in 50% formamide, 5xc3x97Denhart""s solution, 5xc3x97SSPE, 0.2% SDS at 42xc2x0 C., followed by washing in 0.1xc3x97SSPE, and 0.1% SDS at 65xc2x0 C. The phrase xe2x80x9cstringent hybridization conditionsxe2x80x9d means low, moderate, or high stringency conditions.
As used herein, xe2x80x9csequence identityxe2x80x9d means the percentage of identical subunits at corresponding positions in two sequences when the two sequences are aligned to maximize subunit matching, i.e., taking into account gaps and insertions. When a subunit position in both of the two sequences is occupied by the same monomeric subunit, e.g., if a given position is occupied by an adenine in each of two DNA molecules, then the molecules are identical at that position. For example, if 7 positions in a sequence 10 nucleotides in length are identical to the corresponding positions in a second 10-nucleotide sequence, then the two sequences have 70% sequence identity. As another example, if 12 positions in a protein sequence 20 amino acids in length are identical to the corresponding positions in a second 20-amino acid sequence, then the two sequences have 60% sequence identity. Preferably, the length of the compared nucleic acid sequences is at least 60 nucleotides, more preferably at least 75 nucleotides, and most preferably 100 nucleotides; and the length of compared polypeptide sequences is at least 15, 25, and 50 amino acids. Sequence identity is typically measured using sequence analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, University of Wisconsin Biotechnology Center, 1710 University Avenue, Madison, Wis. 53705).
When referring to mutations in a nucleic acid molecule, xe2x80x9csilentxe2x80x9d changes are those that substitute of one or more base pairs in the nucleotide sequence, but do not change the amino acid sequence of the polypeptide encoded by the sequence. xe2x80x9cConservativexe2x80x9d changes are those in which at least one codon in the protein-coding region of the nucleic acid has been changed such that at least one amino acid of the polypeptide encoded by the nucleic acid sequence is substituted with another amino acid having similar characteristics. Examples of conservative amino acid substitutions are ser for ala, thr, or cys; lys for arg; gin for asn, his, or lys; his for asn; glu for asp or lys; asn for his or gin; asp for glu; pro for gly; leu for ile, phe, met, or val; val for ile or leu; ile for leu, met, or val; arg for lys; met for phe; tyr for phe or trp; thr for ser; trp for tyr; and phe for tyr.
As used herein, the term xe2x80x9cvectorxe2x80x9d refers to a nucleic acid molecule capable of transporting another nucleic acid to which it has been linked. One type of preferred vector is an episome, i.e., a nucleic acid capable of extra-chromosomal replication. Preferred vectors are those capable of autonomous replication and/expression of nucleic acids to which they are linked. Vectors capable of directing the expression of genes to which they are operatively linked are referred to herein as xe2x80x9cexpression vectors.xe2x80x9d
A first nucleic acid sequence is xe2x80x9coperablyxe2x80x9d linked with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably linked nucleic acid sequences are contiguous and, where necessary to join two protein coding regions, in reading frame.
A cell, tissue, or organism into which has been introduced a foreign nucleic acid, such as a recombinant vector, is considered xe2x80x9ctransformed,xe2x80x9d xe2x80x9ctransfected,xe2x80x9d or xe2x80x9ctransgenic.xe2x80x9d xe2x80x9cA xe2x80x9ctransgenicxe2x80x9d or xe2x80x9ctransformedxe2x80x9d cell or organism (e.g., a mammal) also includes progeny of the cell or organism. For example, an organism transgenic for CCRG is one in which CCRG nucleic acid has been introduced.
By the term xe2x80x9cCCRG-specific antibodyxe2x80x9d is meant an antibody that binds a CCRG protein (e.g., a protein having the amino acid sequence of SEQ ID NO:7), and displays no substantial binding to other naturally occurring proteins other than those sharing the same antigenic determinants as a CCRG protein. The term includes polyclonal and monoclonal antibodies.
As used herein, xe2x80x9cbind,xe2x80x9d xe2x80x9cbinds,xe2x80x9d or xe2x80x9cinteracts withxe2x80x9d means that one molecule recognizes and adheres to a particular second molecule in a sample, but does not substantially recognize or adhere to other structurally unrelated molecules in the sample. Generally, a first molecule that specifically bindsxe2x80x9d a second molecule has a binding affinity greater than about 105 to 106 moles/liter for that second molecule.
The term xe2x80x9clabeled,xe2x80x9d with regard to a probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent that is directly labeled. Examples of indirect labeling include detection of a primary antibody using a fluorescently labeled secondary antibody and end-labeling of a DNA probe with biotin such that it can be detected with fluorescently labeled streptavidin. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In the case of conflict, the present specification, including definitions will control. In addition, the particular embodiments discussed below are illustrative only and not intended to be limiting.