This invention relates to the field of DNA repair. Specifically, a novel human gene, its encoded enzyme and methods of use thereof are disclosed. The gene may be used beneficially as a marker for genetic screening, mutational analysis and for assessing drug resistance in transformed cells. The encoded enzyme may be used to advantage in glycosylase assays.
Several publications are referenced in this application in order to more fully describe the state of the art to which this invention pertains. The disclosure of each of these publications is incorporated by reference herein. Mismatch repair stabilizes the cellular genome by correcting DNA replication errors and by blocking recombination events between divergent DNA sequences. The mechanism responsible for strand-specific correction of mispaired bases has been highly conserved during evolution. Eukaryotic homologs of bacterial MutS and MutL, which are believed to play key roles in mismatch repair recognition and initiation of repair, have been identified in yeast and mammalian cells. Inactivation of genes encoding these activities results in large increases in spontaneous mutability, and in the case of humans and rodents, predisposition to tumor development.
Lynch syndrome or hereditary nonpolyposis colon cancer (HNPCC) is an autosomal dominant disease, which accounts for approximately 1-5% of all colorectal cancer cases. In this syndrome, colorectal tumors are frequently associated with extracolonic malignancies, such as cancers of the endometrium, stomach, ovary, brain, skin and urinary tract. Tumors from HNPCC patients harbor a genome-wide DNA replication/repair defect. Due to the lack of pathognomonic morphological or biomolecular markers, HNPCC has traditionally posed unique problems to clinicians and geneticists alike, both in terms of diagnosis and clinical management.
Recent breakthroughs in molecular biology have partially elucidated the pathogenic mechanism of this syndrome. Germline mutations in any one of five genes encoding proteins that participate in a specialized DNA mismatch repair system give rise to a predisposition for cancer development in HNPCC families. Patients affected by HNPCC carry these mutations in genes which are involved in DNA mismatch repair. The DNA mismatch repair mechanism contributes to mutational avoidance and genetic stability, thus performing a tumor suppressor function. Loss or inactivation of the wild type allele in somatic cells leads to a dramatic increase of the spontaneous mutation rate. This, in turn, results in the accumulation of mutations in other tumor suppressor genes and oncogenes, ultimately leading to neoplastic transformation.
Microsatellites are repeating sequences that are distributed throughout the human genome, most commonly (A)n/(T)n and (CA)n/(GT)n. Their function is unknown, but they are useful in genetic linkage studies because of their high degree of polymorphism and normally stable inheritance. Several of the genes responsible for HNPCC have been identified using analysis of mutation rate in DNA microsatellites. Mutations of mismatch repair genes can be detected in a subset of sporadic colonic and extracolonic cancers which exhibit variability in the length of microsatellite sequences. This variability is often referred to as microsatellite instability.
Investigators in the field (Peltomaki et al., (1993) Science 260:810-812) have discovered that most colorectal cancers from HNPCC patients show microsatellite instability. These studies revealed that the length of microsatellite DNA at different loci varies between tumor DNA and non-tumor DNA from the same patient. The phrase xe2x80x9creplication error positivexe2x80x9d (RER+) has been used to describe such tumors. It should be noted that only about 70% of HNPCC cases and only about 65% of sporadic tumors with microsatellite instability carry mutations in the known mismatch repair genes (hMSH2, hMLH1, hPMS2, hMSH6 and hPMS1) (Liu et al., (1996) Nature Medicine 2:169-174). The remaining 30-35% of the cases have an as yet unidentified mismatch repair genetic defect. Thus, there is a pressing need to identify the other active components in the DNA mismatch repair pathway, as mutations in these genes may result in an increased propensity for cancer.
The Fragile X or Martin Bell syndrome is the most common single recognized form of inherited mental retardation. Fifty percent of all X-linked mental retardation may be attributable to the Fragile X syndrome. The disorder is found in all ethnic groupings with a frequency of 0.3-1 per 1000 males and 0.2-0.6 per 1000 females. The full clinical syndrome, which is found in approximately 60% of affected males, consists of moderate mental retardation with an IQ typically in the range 35-50, elongated facies with large everted ears, and macroorchidism. This syndrome is unusual in that it is associated with the appearance of a fragile site on the long arm of the X chromosome at Xq27.3 (Sutherland, G. R., (1977) Science 197:256-266). This can be visualized cytogenetically in metaphase chromosomes prepared from lymphocytes of affected individuals which have been cultured under conditions of folate deficiency or thymidine stress. The study of the segregation of polymorphic markers within fragile X families has confirmed that the mutation lies in the same region of the X-chromosome as that exhibiting cytogenetic fragility.
There is an imbalance of penetrance of the phenotype associated with this syndrome in the different generations of kindreds in which the mutation is segregating. The likelihood of developing mental impairment depends on an individual""s position in the pedigree. As the mutation progresses through the generations, the risk of mental impairment increases. These observations are not consistent with classical X linkage and are collectively known as the Sherman paradox. Hypotheses based on these observations have suggested that the mutation exists in two formsxe2x80x94a premutation and a full mutation form. Nonpenetrant individuals are said to carry a premutation chromosome, that is, a chromosome which has no abnormal phenotypic effect but which is capable of progressing to a fully penetrant mutation on passage through a female oogenesis.
Two alterations in the DNA at the fragile X site have been identified: abnormal amplification of a CpG-rich DNA sequence (a CpG island) and hypermethylation of such sequences. The molecular basis of the amplification is the expansion of a CGG triplet microsatellite into large arrays. In individuals expressing the full clinical phenotype, the DNA in this region becomes hypermethylated, leading to the transcriptional shut down of the gene FMR-1 (fragile X mental retardation 1) which is transcribed across this region. The clinical phenotype is likely caused by a loss of gene expression. It has been postulated that in Fragile X syndrome, expansion of the (CGG)n repeat from premutation to full mutation may be related to an aberrant (misdirected) DNA mismatch repair event. This may be favored by the transient lack of multiple methyl signals in the CGG repeat as well as in flanking single copy sequences during early stages of embryonal development. Similar to Fragile X syndrome, defective DNA mismatch repair may play a role in the expansion of triplet repeats associated with several disorders such as myotonic dystrophy, Huntington""s disease, spino-cerebellar ataxias and Kennedy""s disease.
The isolation of nucleic acids and proteins which, when mutated, give rise to these various disorders, enables the development of diagnostic and prognostic kits for assessing patients at risk. The biochemical characterization of the genes encoding the components of the DNA mismatch repair system may ultimately facilitate gene replacement therapies for use in the treatment of malignancy and other inherited genetic disorders.
This invention provides biological molecules useful for identification, detection, and/or regulation of components in the complex DNA damage recognition/repair pathway. According to one aspect of the invention, an isolated nucleic acid molecule is provided which includes a sequence encoding a methyl CpG binding protein of a size between about 60 and 75 kilodaltons. The encoded protein, referred to herein as MED1 (methyl-CpG binding endonuclease 1; also referred to in the literature as MBD4)) comprises a tripartite structure including an amino terminal methyl-CpG binding domain with significant homology to the rat protein, MeCP2 and the human protein, PCM1, a central region rich in positively-charged amino acids which contains nuclear localization signals, and a carboxy terminal catalytic domain which shares homology with several bacterial endonucleases involved in DNA repair. The protein demonstrates significant binding affinity for hMLH1 and mMSH2. In a preferred embodiment of the invention, an isolated nucleic acid molecule is provided that includes a cDNA encoding a human MED1 protein. In a particularly preferred embodiment, the human MED1 protein has an amino acid sequence the same as Sequence I.D. No. 2. An exemplary nucleic acid molecule of the invention comprises Sequence I.D. No. 1.
According to another aspect of the present invention, an isolated nucleic acid molecule is provided, which has a sequence selected from the group consisting of: (1) Sequence I.D. No. 1; (2) a sequence specifically hybridizing with preselected portions or all of the complementary strand of Sequence I.D. No. 1; a sequence encoding preselected portions of Sequence I.D. No. 1, (3) a sequence encoding part or all of a polypeptide having amino acid Sequence I.D. No. 2. Such partial sequences are useful as probes to identify and isolate homologues of the MED1 gene of the invention. Accordingly, isolated nucleic acid sequences encoding natural allelic variants of Sequence I.D. No. 1 are also contemplated to be within the scope of the present invention. The term natural allelic variants will be defined hereinbelow.
In yet another embodiment of the invention, isolated genomic DNA molecules are provided which encode the Med-1 protein of the invention. These nucleic acids (SEQ ID NO: 21 and 22) may be used to advantage in screening assays which identify germline and somatic mutations in the DNA encoding Med-1.
The present invention also provides MED1 genomic nucleic acid of mouse or human origin having a sequence substantially the same as that contained in phage stocks as deposited on Jul. 28, 1998 at the American Type Culture Collection, 10801 University Blvd, Manassas, Virginia 20110-2209 USA, under the terms of the Budapest Treaty with accession numbers: 203073 and 203074.
MED1 polypeptide may conveniently be obtained by introducing expression vectors into host cells in which the vector is functional, culturing the host cells so that the MED1 polypeptide is produced and recovering the MED1 polypeptide from the host cells or the surrounding medium. Vectors comprising nucleic acid according to the present invention and host cells comprising such vectors or nucleic acid form further aspects of the present invention.
According to another aspect of the present invention, an isolated human methyl CpG binding protein is provided which has a deduced molecular weight of between about 60 kDa and 75 kDa. The protein comprises an amino-terminal methyl-CpG binding domain with significant homology to the rat protein MeCP2 and the human protein PCM1, a central region rich in positively-charged amino acids which contains nuclear localization signals, and a carboxy terminal catalytic domain which shares homology with several bacterial endonucleases involved in DNA repair. In a preferred embodiment of the invention, the protein is of human origin, and has an amino acid sequence the same as Sequence I.D. No. 2. In a further embodiment the protein may be encoded by natural allelic variants of Sequence I.D. No. 1. Inasmuch as certain amino acid variations may be present in a MED1 protein encoded by a natural allelic variant, such proteins are also within the scope of the invention.
According to another aspect of the present invention, antibodies immunologically specific for the proteins described hereinabove are provided.
In yet a further aspect of the invention, assays are provided for assessing the glycosylase activity of MED1. Also provided are methods employing the MED1 protein to detect transition single-nucleotide polymorphisms at CpG sites. Also provided are methods wherein polymerase chain reaction/single strand conformation polymorphism are utilized to detect mutations in the MED1 gene. Methods employing loss of heterozygosity (LOH) analysis are also disclosed which may be used to advantage in mutational screening assays for possible MED1 mutations.
Various terms relating to the biological molecules of the present invention are used hereinabove and also throughout the specifications and claims. The terms xe2x80x9cspecifically hybridizing,xe2x80x9d xe2x80x9cpercent similarityxe2x80x9d and xe2x80x9cpercent identity (identical)xe2x80x9d are defined in detail in the description set forth below.
With reference to nucleic acids of the invention, the term xe2x80x9cisolated nucleic acidxe2x80x9d is sometimes used. This term, when applied to DNA, refers to a DNA molecule that is separated from sequences with which it is immediately contiguous (in the 5xe2x80x2 and 3xe2x80x2 directions) in the naturally occurring genome of the organism from which it originates. For example, the xe2x80x9cisolated nucleic acidxe2x80x9d may comprise a DNA or cDNA molecule inserted into a vector, such as a plasmid or virus vector, or integrated into the DNA of a prokaryote or eukaryote.
With respect to RNA molecules of the invention, the term xe2x80x9cisolated nucleic acidxe2x80x9d primarily refers to an RNA molecule encoded by an isolated DNA molecule as defined above. Alternatively, the term may refer to an RNA molecule that has been sufficiently separated from RNA molecules with which it would be associated in its natural state (i.e., in cells or tissues), such that it exists in a xe2x80x9csubstantially purexe2x80x9d form (the term xe2x80x9csubstantially purexe2x80x9d is defined below).
With respect to protein, the term xe2x80x9cisolated proteinxe2x80x9d or xe2x80x9cisolated and purified proteinxe2x80x9d is sometimes used herein. This term refers primarily to a protein produced by expression of an isolated nucleic acid molecule of the invention. Alternatively, this term may refer to a protein which has been sufficiently separated from other proteins with which it would naturally be associated, so as to exist in xe2x80x9csubstantially purexe2x80x9d form.
The term xe2x80x9csubstantially purexe2x80x9d refers to a preparation comprising at least 50-60% by weight the compound of interest (e.g., nucleic acid, oligonucleotide, protein, etc.). More preferably, the preparation comprises at least 75% by weight, and most preferably 90-99% by weight, the compound of interest. Purity is measured by methods appropriate for the compound of interest (e.g. chromatographic methods, agarose or polyacrylamide gel electrophoresis, HPLC analysis, and the like).
With respect to antibodies of the invention, the term xe2x80x9cimmunologically specificxe2x80x9d refers to antibodies that bind to one or more epitopes of a protein of interest (e.g., MED1), but which do not substantially recognize and bind other molecules in a sample containing a mixed population of antigenic biological molecules.
With respect to oligonucleotides, the term xe2x80x9cspecifically hybridizingxe2x80x9d refers to the association between two single-stranded nucleotide molecules of sufficiently complementary sequence to permit such hybridization under pre-determined conditions generally used in the art (sometimes termed xe2x80x9csubstantially complementaryxe2x80x9d). In particular, the term refers to hybridization of an oligonucleotide with a substantially complementary sequence contained within a single-stranded DNA or RNA molecule of the invention, to the substantial exclusion of hybridization of the oligonucleotide with single-stranded nucleic acids of non-complementary sequence.
The present invention also includes active portions, fragments, derivatives and functional mimetics of the MED1 polypeptide or protein of the invention.
An xe2x80x9cactive portionxe2x80x9d of MED1 polypeptide means a peptide which is less than said full length MED1 polypeptide, but which retains its essential biological activity, e.g., methyl-CpG DNA binding and/or endonuclease activity and/or glycosylase activity.
A xe2x80x9cfragmentxe2x80x9d of the MED1 polypeptide means a stretch of amino acid residues of at least about five to seven contiguous amino acids, often at least about seven to nine contiguous amino acids, typically at least about nine to thirteen contigous amino acids and, most preferably, at least about twenty to thirty or more contiguous amino acids. Fragments of the MED1 polypeptide sequence, antigenic determinants or epitopes are useful for raising antibodies to a portion of the MED1 amino acid sequence.
A xe2x80x9cderivativexe2x80x9d of the MED1 polypeptide or a fragment thereof means a polypeptide modified by varying the amino acid sequence of the protein, e.g. by manipulation of the nucleic acid encoding the protein or by altering the protein itself. Such derivatives of the natural amino acid sequence may involve insertion, addition, deletion or substitution of one or more amino acids, without fundamentally altering the essential activity of the wildtype MED1 polypeptide.
xe2x80x9cFunctional mimeticxe2x80x9d means a substance which may not contain an active portion of the MED1 amino acid sequence, and probably is not a peptide at all, but which retains the essential biological activity of natural MED1 polypeptide.
The nucleic acids, proteins/polypeptides, peptides and antibodies of the present invention may be used to advantage as markers for diagnosis and prognosis of those at risk for colon and other cancers. The molecules may also be useful in the diagnosis and/or treatment of Fragile X syndrome and other diseases characterized by triplet repeat expansion. The MED1 molecules of the invention may also be used as research tools in DNA modification/DNA analysis technologies and will facilitate the elucidation of the mechanistic action of the novel genetic and protein interactions involved in the maintenance of DNA fidelity.
Thus, the present invention also provides nucleic acid molecules, polypeptides and/or antibodies as mentioned above for use in medical treatment.
Further, the present invention provides use of a nucleic acid molecule, polypeptide and/or antibody in the preparation of a medicament for treating cancer, in particular, colorectal cancer.
In a further aspect of the present invention, there is provided a kit for detecting mutations in the MED1 gene associated with cancer, or a susceptibility to cancer, the kit comprising one or more nucleic acid probes capable of binding and/or detecting a mutated MED1 nucleic acid. Alternatively, the kit may comprise one or more antibodies capable of specifically binding and/or detecting a mutated MED1 nucleic acid or amino acid sequence or a pair of oligonucleotide primers having sequences corresponding to, or complementary to a portion of the nucleic acid sequence set out in Sequence I. D. No. 1 or 5 for use in amplifying a MED1 nucleic acid sequence or mutant allele thereof.
In yet another aspect of the invention, transgenic animals are provided which are useful for elucidating the role of MED1 in growth and development. Isolation of the mouse genomic DNA also facilitates the production of MED1 knock-out mice.
Aspects and embodiments of the present invention will now be illustrated, by way of example, with reference to the accompanying figures. Further aspects and embodiments will be apparent to those skilled in the art. All documents mentioned in this text are incorporated herein by reference.