This invention pertains to the field of cytogenetics. More particularly this invention pertains to the identification of genes in a region of amplification at about 20q13 in various cancers. The genes disclosed here can be used as probes specific for the 20q13 amplicon as well as for treatment of various cancers.
Chromosome abnormalities are often associated with genetic disorders, degenerative diseases, and cancer. In particular, the deletion or multiplication of copies of whole chromosomes or chromosomal segments, and higher level amplifications of specific regions of the genome are common occurrences in cancer. See, for example Smith, et al., Breast Cancer Res. Treat., 18: Suppl. 1: 5-14 (1991, van de Vijer and Nusse, Biochim. Biophys. Acta. 1072: 33-50(1991), Sato, et al., Cancer. Res., 50: 7184-7189 (1990). In fact, the amplification and deletion of DNA sequences containing proto-oncogenes and tumor-suppressor genes, respectively, are frequently characteristic of tumorigenesis. Dutrillaux, et al., Cancer Genet. Cytogenet., 49: 203-217 (1990). Clearly, the identification of amplified and deleted regions and the cloning of the genes involved is crucial both to the study of tumorigenesis and to the development of cancer diagnostics.
The detection of amplified or deleted chromosomal regions has traditionally been done by cytogenetics. Because of the complex packing of DNA into the chromosomes, resolution of cytogenetic techniques has been limited to regions larger than about 10 Mb; approximately the width of a band in Giemsa-stained chromosomes. In complex karyotypes with multiple translocations and other genetic changes, traditional cytogenetic analysis is of little utility because karyotype information is lacking or cannot be interpreted. Teyssier, J. R., Cancer Genet. Cytogenet., 37: 103 (1989). Furthermore, conventional cytogenetic banding analysis is time consuming, labor intensive, and frequently difficult or impossible.
More recently, cloned probes have been used to assess the amount of a given DNA sequence in a chromosome by Southern blotting. This method is effective even if the genome is heavily rearranged so as to eliminate useful karyotype information. However, Southern blotting only gives a rough estimate of the copy number of a DNA sequence, and does not give any information about the localization of that sequence within the chromosome.
Comparative genomic hybridization (CGH) is a more recent approach to identify the presence and localization of amplified/deleted sequences. See Kallioniemi, et al., Science, 258: 818 (1992). CGH, like Southern blotting, reveals amplifications and deletions irrespective of genome rearrangement. Additionally, CGH provides a more quantitative estimate of copy number than Souther blotting, and moreover also provides information of the localization of the amplified or deleted sequence in the normal chromosome.
Using CGH, the chromosomal 20q13 region has been identified as a region that is frequently amplified in cancers (see, e.g. Kallioniemi et al., Genomics, 20: 125-128 (1994)). Initial analysis of this region in breast cancer cell lines identified a region approximately 2 Mb on chromosome 20 that is consistently amplified.
The present invention relates to the identification of a narrow region (about 606 kb) within a 2 Mb amplicon located at about chromosome 20q13 (more precisely at 20q13.2) that is consistently amplified in primary tumors. In addition, this invention provides cDNA sequences from a number of genes which map to this region. These sequences are useful as probes or as probe targets for monitoring the relative copy number of corresponding sequences from a biological sample such as a tumor cell. Also provided is a contig (a series of clones that contiguously spans this amplicon) which can be used to prepare probes specific for the amplicon. The probes can be used to detect chromosomal abnormalities at 20q13.
Thus, in one embodiment, this invention provides a method of detecting a chromosome abnormality (e.g., an amplification or a deletion) at about position FLpter 0.825 on human chromosome 20 (20q13.2). The method involves contacting a chromosome sample from a patient with a composition consisting essentially of one or more labeled nucleic acid probes each of which binds selectively to a target polynucleotide sequence at about position FLpter 0.825 on human chromosome 20 under conditions in which the probe forms a stable hybridization complex with the target sequence; and detecting the hybridization complex. The step of detecting the hybridization complex can involve determining the copy number of the target sequence. The probe preferably comprises a nucleic acid that specifically hybridizes under stringent conditions to a nucleic acid selected from the nucleic acids disclosed here. Even more preferably, the probe comprises a subsequence selected from sequences set forth in SEQ. ID. Nos. 1-10 and 12. The probe is preferably labeled, and is more preferably labeled with digoxigenin or biotin. In one embodiment, the hybridization complex is detected in interphase nuclei in the sample. Detection is preferably carried out by detecting a fluorescent label (e.g., FITC, fluorescein, or Texas Red). The method can further involve contacting the sample with a reference probe which binds selectively to a chromosome 20 centromere.
This invention also provides for two new genes, ZABC1 and 1b1, in the 20q13.2 region that are both amplified and overexpressed in a variety of cancers. ZABC1 is a putative zinc finger protein. Zinc finger proteins are found in a variety of transcription factors, and amplification or overexpression of transcription factors typically results in cellular mis-regulation. ZABC1 and 1b1 thus appear to play an important role in the etiology of a number of cancers.
This invention provides for a new human cyclophilin nucleic acid (SEQ ID NO 13). Cyclophilin nucleic acids have been implicated in a variety of cellular processes, including signal transduction.
This invention also provides for proteins encoded by nucleic acid sequences in the 20q13 amplicon (SEQ. ID. Nos: 1-10 and 12-13) and subsequences, more preferably subsequences of at least 10 amino acids, preferably of at least 20 amino acids and most preferably of at least 30 amino acids in length. Particularly preferred subsequences are epitopes specific to the 20q13 proteins, more preferably epitopes specific to the ZABC1 and 1b1 proteins. Such proteins include, but are not limited to isolated polypeptides comprising at least 20 amino acids from a polypeptide encoded by the nucleic acids of SEQ. ID No. 1-10 and 12-13 or from the polypeptide of SEQ. ID. No. 11 wherein the polypeptide, when presented as an immunogen, elicits the production of an antibody which specifically binds to a polypeptide selected from the group consisting of a polypeptide encoded by the nucleic acids of SEQ. ID No. 1-10 and 12-13 or from the polypeptide of SEQ. ID. No. 11, where the polypeptide does not bind to antisera raised against a polypeptide selected from the group consisting of a polypeptide encoded by the nucleic acids of SEQ. ID No. 1-10 and 12-13 or from the polypeptide of SEQ. ID. No. 11 which has been fully immunosorbed with a polypeptide selected from the group consisting of a polypeptide encoded by the nucleic acids of SEQ. ID No. 1-10 and 12-13 or from the polypeptide of SEQ. ID. No. 11. In preferred embodiments, the polypeptides of the invention hybridize to antisera raised against a polypeptide encoded by those encoded by SEQ ID NOs. 1-13, where the antisera has been immunosorbed with the most structurally related previously known polypeptide. For example, a polypeptide of the invention binds to antisera raised against a polypeptide encoded by SEQ ID NO. 13, wherein the antisera has been immunosorbed with a rat or mouse cyclophilin polypeptide (Rat cyclophilin nucleic acids are known; see, GenBank(trademark) under accession No. M19533; Mouse cyclophilin nucleic acids are known; see, GenBank(trademark) under accession No. 50620. cDNAs from the mouse and rat cyclophilin cDNAs are about 85% identical to SEQ ID NO. 13).
In another embodiment, the method can involve detecting a polypeptide (protein) encoded by a nucleic acid (ORF) in the 20q13 amplicon. The method may include any of a number of well known protein detection methods including, but not limited to, the protein assays disclosed herein.
This invention also provides cDNA sequences from genes in the amplicon (SEQ. ID. Nos. 1-10 and 12-13). The nucleic acid sequences can be used in therapeutic applications according to known methods for modulating the expression of the endogenous gene or the activity of the gene product. Examples of therapeutic approaches include antisense inhibition of gene expression, gene therapy, monoclonal antibodies that specifically bind the gene products, and the like. The genes can also be used for recombinant expression of the gene products in vitro.
This invention also provides for proteins (e.g., SEQ. ID. No. 11) encoded by the cDNA sequences from genes in the amplicon (e.g., SEQ. ID. Nos. 1-10 and 12-13). Where the amplified nucleic acids include cDNA which are expressed, detection and/or quantification of the protein expression product can be used to identify the presence or absence or quantify the amplification level of the amplicon or of abnormal protein products produced by the amplicon.
The probes disclosed here can be used in kits for the detection of a chromosomal abnormality at about position FLpter 0.825 on human chromosome 20. The kits include a compartment which contains a labeled nucleic acid probe which binds selectively to a target polynucleotide sequence at about FLpter 0.825 on human chromosome 20. The probe preferably includes at least one nucleic acid that specifically hybridizes under stringent conditions to a nucleic acid selected from the nucleic acids disclosed here. Even more preferably, the probes comprise one or more nucleic acids selected from the nucleic acids disclosed here. In a preferred embodiment, the probes are labeled with digoxigenin or biotin. The kit may further include a reference probe specific to a sequence in the centromere of chromosome 20.
Definitions
A xe2x80x9cnucleic acid samplexe2x80x9d as used herein refers to a sample comprising DNA in a form suitable for hybridization to a probes of the invention. The nucleic acid may be total genomic DNA, total mRNA, genomic DNA or mRNA from particular chromosomes, or selected sequences (e.g. particular promoters, genes, amplification or restriction fragments, cDNA, etc.) within particular amplicons disclosed here. The nucleic acid sample may be extracted from particular cells or tissues. The tissue sample from which the nucleic acid sample is prepared is typically taken from a patient suspected of having the disease associated with the amplification being detected. In some cases, the nucleic acids may be amplified using standard techniques such as PCR, prior to the hybridization. The sample may be isolated nucleic acids immobilized on a solid surface (e.g., nitrocellulose) for use in Southern or dot blot hybridizations and the like. The sample may also be prepared such that individual nucleic acids remain substantially intact and comprises interphase nuclei prepared according to standard techniques. A xe2x80x9cnucleic acid samplexe2x80x9d as used herein may also refer to a substantially intact condensed chromosome (e.g. a metaphase chromosome). Such a condensed chromosome is suitable for use as a hybridization target in in situ hybridization techniques (e.g. FISH). The particular usage of the term xe2x80x9cnucleic acid samplexe2x80x9d (whether as extracted nucleic acid or intact metaphase chromosome) will be readily apparent to one of skill in the art from the context in which the term is used. For instance, the nucleic acid sample can be a tissue or cell sample prepared for standard in situ hybridization methods described below. The sample is prepared such that individual chromosomes remain substantially intact and typically comprises metaphase spreads or interphase nuclei prepared according to standard techniques.
A xe2x80x9cchromosome samplexe2x80x9d as used herein refers to a tissue or cell sample prepared for standard in situ hybridization methods described below. The sample is prepared such that individual chromosomes remain substantially intact and typically comprises metaphase spreads or interphase nuclei prepared according to standard techniques.
xe2x80x9cNucleic acidxe2x80x9d refers to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, would encompass known analogs of natural nucleotides that can function in a similar manner as naturally occurring nucleotides.
An xe2x80x9cisolatedxe2x80x9d polynucleotide is a polynucleotide which is substantially separated from other contaminants that naturally accompany it, e.g., protein, lipids, and other polynucleotide sequences. The term embraces polynucleotide sequences which have been removed or purified from their naturally-occurring environment or clone library, and include recombinant or cloned DNA isolates and chemically synthesized analogues or analogues biologically synthesized by heterologous systems.
xe2x80x9cSubsequencexe2x80x9d refers to a sequence of nucleic acids that comprise a part of a longer sequence of nucleic acids.
A xe2x80x9cprobexe2x80x9d or a xe2x80x9cnucleic acid probexe2x80x9d, as used herein, is defined to be a collection of one or more nucleic acid fragments whose hybridization to a target can be detected. The probe may be unlabeled or labeled as described below so that its binding to the target can be detected. The probe is produced from a source of nucleic acids from one or more particular (preselected) portions of the genome, for example one or more clones, an isolated whole chromosome or chromosome fragment, or a collection of polymerase chain reaction (PCR) amplification products. The probes of the present invention are produced from nucleic acids found in the 20q13 amplicon as described herein. The probe may be processed in some manner, for example, by blocking or removal of repetitive nucleic acids or enrichment with unique nucleic acids. Thus the word xe2x80x9cprobexe2x80x9d may be used herein to refer not only to the detectable nucleic acids, but to the detectable nucleic acids in the form in which they are applied to the target, for example, with the blocking nucleic acids, etc. The blocking nucleic acid may also be referred to separately. What xe2x80x9cprobexe2x80x9d refers to specifically is clear from the context in which the word is used.
The probe may also be isolated nucleic acids immobilized on a solid surface (e.g., nitrocellulose). In some embodiments, the probe may be a member of an array of nucleic acids as described, for instance, in WO 96/17958. Techniques capable of producing high density arrays can also be used for this purpose (see, e.g., Fodor et al. Science 767-773 (1991) and U.S. Pat. No. 5,143,854).
xe2x80x9cHybridizingxe2x80x9d refers the binding of two single stranded nucleic acids via complementary base pairing.
xe2x80x9cBind(s) substantiallyxe2x80x9d or xe2x80x9cbinds specificallyxe2x80x9d or xe2x80x9cbinds selectivelyxe2x80x9d or xe2x80x9chybridizes specificallyxe2x80x9d refer to complementary hybridization between an oligonucleotide and a target sequence and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target polynucleotide sequence. These terms also refer to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. The term xe2x80x9cstringent conditionsxe2x80x9d refers to conditions under which a probe will hybridize to its target subsequence, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. xe2x80x9cStringent hybridizationxe2x80x9d and xe2x80x9cStringent hybridization wash conditionsxe2x80x9d in the context of nucleic acid hybridization experiments such as CGH, FISH, Southern and northern hybridizations are sequence dependent, and are different under different environmental parameters. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes part I chapter 2 xe2x80x9coverview of principles of hybridization and the strategy of nucleic acid probe assaysxe2x80x9d, Elsevier, N.Y. Generally, highly stringent hybridization and wash conditions are selected to be about 5xc2x0 C. lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and ph. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very stringent conditions are selected to be equal to the Tm for a particular probe.
An example of stringent hybridization conditions for hybridization of complementary nucleic acids which have more than 100 complementary residues on a filter in a Southern or northern blot is 50% formalin with 1 mg of heparin at 42xc2x0 C., with the hybridization being carried out overnight. An example of stringent wash conditions is a 0.2xc3x97SSC wash at 65xc2x0 C. for 15 minutes (see, Sambrook, supra for a description of SSC buffer). Often, the high stringency wash is preceded by a low stringency wash to remove background probe signal. An example medium stringency wash for a duplex of, e.g., about 100 nucleotides or more, is 1xc3x97SSC at 45xc2x0 C. for 15 minutes. An example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4xc3x97SSC at 40xc2x0 C. for 15 minutes. In general, a signal to noise ratio of 2xc3x97 (or higher) than that observed for an unrelated probe in the particular hybridization assay indicates detection of a specific hybridization.
One of skill will recognize that the precise sequence of the particular probes described herein can be modified to a certain degree to produce probes that are xe2x80x9csubstantially identicalxe2x80x9d to the disclosed probes, but retain the ability to bind substantially to the target sequences. Such modifications are specifically covered by reference to the individual probes herein. The term xe2x80x9csubstantial identityxe2x80x9d of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 90% sequence identity, more preferably at least 95%, compared to a reference sequence using the methods described below using standard parameters.
Two nucleic acid sequences are said to be xe2x80x9cidenticalxe2x80x9d if the sequence of nucleotides in the two sequences is the same when aligned for maximum correspondence as described below. The term xe2x80x9ccomplementary toxe2x80x9d is used herein to mean that the complementary sequence is identical to all or a portion of a reference polynucleotide sequence. Nucleic acids which do not hybridize to complementary versions of each other under stringent conditions are still substantially identical if the polypeptides which they encode are substantially identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum codon degeneracy permitted by the genetic code.
Sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences of the two sequences over a xe2x80x9ccomparison windowxe2x80x9d to identify and compare local regions of sequence similarity. A xe2x80x9ccomparison windowxe2x80x9d, as used herein, refers to a segment of at least about 20 contiguous positions, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman Adv. Appl. Math. 2: 482 (1981), by the homology alignment algorithm of Needleman and Wunsch J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson and Lipman Proc. Natl. Acad. Sci. (U.S.A.) 85: 2444 (1988), by computerized implementations of these algorithms.
xe2x80x9cPercentage of sequence identityxe2x80x9d is determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity. Another indication that nucleotide sequences are substantially identical is if two molecules hybridize to the same nucleic acid sequence under stringent conditions.
xe2x80x9cConservatively modified variationsxe2x80x9d of a particular nucleic acid sequence refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given polypeptide. For instance, the codons CGU, CGC, CGA, CGG, AGA, and AGG all encode the amino acid arginine. Thus, at every position where an arginine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are xe2x80x9csilent variations,xe2x80x9d which are one species of xe2x80x9cconservatively modified variations.xe2x80x9d Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine) can be modified to yield a functionally identical molecule by standard techniques. Accordingly, each xe2x80x9csilent variationxe2x80x9d of a nucleic acid which encodes a polypeptide is implicit in each described sequence. Furthermore, one of skill will recognize that individual substitutions, deletions or additions which alter, add or delete a single amino acid or a small percentage of amino acids (typically less than 5%, more typically less than 1%) in an encoded sequence are xe2x80x9cconservatively modified variationsxe2x80x9d where the alterations result in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following six groups each contain amino acids that are conservative substitutions for one another:
1) Alanine (A), Serine (S), Threonine (M);
2) Aspartic acid (D), Glutamic acid (E);
3) Asparagine (N), Glutamine (Q);
4) Arginine (R), Lysine (K);
5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and
6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).
The term xe2x80x9c20q13 amplicon proteinxe2x80x9d is used herein to refer to proteins encoded by ORFs in the 20q13 amplicon disclosed herein. Assays that detect 20q13 amplicon proteins are intended to detect the level of endogenous (native) 20q13 amplicon proteins present in subject biological sample. However, exogenous 20q13 amplicon proteins (from a source extrinsic to the biological sample) may be added to various assays to provide a label or to compete with the native 20q13 amplicon protein in binding to an anti-20q13 amplicon protein antibody. One of skill will appreciate that a 20q13 amplicon protein mimetic may be used in place of exogenous 20q13 protein in this context. A xe2x80x9c20q13 proteinxe2x80x9d, as used herein, refers to a molecule that bears one or more 20q13 amplicon protein epitopes such that it is specifically bound by an antibody that specifically binds a native 20q13 amplicon protein.
As used herein, an xe2x80x9cantibodyxe2x80x9d refers to a protein consisting of one or more polypeptides substantially encoded by immunoglobulin genes or fragments of immunoglobulin genes. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as the myriad immunoglobulin variable region genes. Light chains are classified as either kappa or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.
The basic immunoglobulin (antibody) structural unit is known to comprise a tetramer. Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one xe2x80x9clightxe2x80x9d (about 25 kD) and one xe2x80x9cheavyxe2x80x9d chain (about 50-70 kD). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms variable light chain (VL) and variable heavy chain (VH) refer to these light and heavy chains respectively.
Antibodies may exist as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases. Thus, for example, pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)xe2x80x22, a dimer of Fab which itself is a light chain joined to VH-CH1 by a disulfide bond. The F(ab)xe2x80x22 may be reduced under mild conditions to break the disulfide linkage in the hinge region thereby converting the F(ab)xe2x80x22 dimer into an Fabxe2x80x2 monomer. The Fabxe2x80x2 monomer is essentially an Fab with part of the hinge region (see, Fundamental Immunology, W. E. Paul, ed., Raven Press, N.Y. (1993) for a more detailed description of other antibody fragments). While various antibody fragments are defined in terms of the digestion of an intact antibody, one of skill will appreciate that such Fabxe2x80x2 fragments may be synthesized de novo either chemically or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein also includes antibody fragments either produced by the modification of whole antibodies or synthesized de novo using recombinant DNA methodologies.
The phrase xe2x80x9cspecifically binds to a proteinxe2x80x9d or xe2x80x9cspecifically immunoreactive withxe2x80x9d, when referring to an antibody refers to a binding reaction which is determinative of the presence of the protein in the presence of a heterogeneous population of proteins and other biologics. Thus, under designated immunoassay conditions, the specified antibodies bind to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to a protein under such conditions may require an antibody that is selected for its specificity for a particular protein. For example, antibodies can be raised to the a 20q13 amplicon protein that bind the 20q13 amplicon protein and not to any other proteins present in a biological sample. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York, for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity.