The present invention involves DNA mismatch repair genes. In particular, the invention relates to identification of mutations and polymorphisms in DNA mismatch repair genes, to identification and characterization of DNA mismatch-repair-defective tumors, and to detection of genetic susceptibility to cancer.
In recent years, with the development of powerful cloning and amplification techniques such as the polymerase chain reaction (PCR), in combination with a rapidly accumulating body of information concerning the structure and location of numerous human genes and markers, it has become practical and advisable to collect and analyze samples of DNA or RNA from individuals who are members of families which are identified as exhibiting a high frequency of certain genetically transmitted disorders. For example, screening procedures are routinely used to screen for genes involved in sickle cell anemia, cystic fibrosis, fragile X chromosome syndrome and multiple sclerosis. For some types of disorders, early diagnosis can greatly improve the person""s long-term prognosis by, for example, adopting an aggressive diagnostic routine, and/or by making life style changes if appropriate to either prevent or prepare for an anticipated problem.
Once a particular human gene mutation is identified and linked to a disease, development of screening procedures to identify high-risk individuals can be relatively straight forward. For example, after the structure and abnormal phenotypic role of the mutant gene are understood, it is possible to design primers for use in PCR to obtain amplified quantities of the gene from individuals for testing. However, initial discovery of a mutant gene, i.e., its structure, location and linkage with a known inherited health problem, requires substantial experimental effort and creative research strategies.
One approach to discovering the role of a mutant gene in causing a disease begins with clinical studies on individuals who are in families which exhibit a high frequency of the disease. In these studies, the approximate location of the disease-causing locus is determined indirectly by searching for a chromosome marker which tends to segregate with the locus. A principal limitation of this approach is that, although the approximate genomic location of the gene can be determined, it does not generally allow actual isolation or sequencing of the gene. For example, Lindblom et al.3 reported results of linkage analysis studies performed with SSLP (simple sequence length polymorphism) markers on individuals from a family known to exhibit a high incidence of hereditary non-polyposis colon cancer (HNPCC). Lindblom et al. found a xe2x80x9ctight linkagexe2x80x9d between a polymorphic marker on the short arm of human chromosome 3 (3p21-23) and a disease locus apparently responsible for increasing an individual""s risk of developing colon cancer. Even though 3p21-23 is a fairly specific location relative to the entire genome, it represents a huge DNA region relative to the probable size of the mutant gene. The mutant gene could be separated from the markers identifying the locus by millions of bases. At best, such linkage studies have only limited utility for screening purposes because in order to predict one person""s risk, genetic analysis must be performed with tightly linked genetic markers on a number of related individuals in the family. It is often impossible to obtain such information, particularly if affected family members are deceased. Also, informative markers may not exist in the family under analysis. Without knowing the gene""s structure, it is not possible to sample, amplify, sequence and determine directly whether an individual carries the mutant gene.
Another approach to discovering a disease-causing mutant gene begins with design and trial of PCR primers, based on known information about the disease, for example, theories for disease state mechanisms, related protein structures and function, possible analogous genes in humans or other species, etc. The objective is to isolate and sequence candidate normal genes which are believed to sometimes occur in mutant forms rendering an individual disease prone. This approach is highly dependent on how much is known about the disease at the molecular level, and on the investigator""s ability to construct strategies and methods for finding candidate genes. Association of a mutation in a candidate gene with a disease must ultimately be demonstrated by performing tests on members of a family which exhibits a high incidence of the disease. The most direct and definitive way to confirm such linkage in family studies is to use PCR primers which are designed to amplify portions of the candidate gene in samples collected from the family members. The amplified gene products are then sequenced and compared to the normal gene structure for the purpose of finding and characterizing mutations. A given mutation is ultimately implicated by showing that affected individuals have it while unaffected individuals do not, and that the mutation causes a change in protein function which is not simply a polymorphism.
Another way to show a high probability of linkage between a candidate gene mutation and disease is by determining the chromosome location of the gene, then comparing the gene""s map location to known regions of disease-linked loci such as the one identified by Lindblom et al. Coincident map location of a candidate gene in the region of a previously identified disease-linked locus may strongly implicate an association between a mutation in the candidate gene and the disease.
There are other ways to show that mutations in a gene candidate may be linked to the disease. For example, artificially produced mutant forms of the gene can be introduced into animals. Incidence of the disease in animals carrying the mutant gene can then be compared to animals with the normal genotype. Significantly elevated incidence of disease in animals with the mutant genotype, relative to animals with the wild-type gene, may support the theory that mutations in the candidate gene are sometimes responsible for occurrence of the disease.
One type of disease which has recently received much attention because of the discovery of disease-linked gene mutations is Hereditary Nonpolyposis Colon Cancer (HNPCC).1,2 Members of HNPCC families also display increased susceptibility to other cancers including endometrial, ovarian, gastric and breast. Approximately 10% of colorectal cancers are believed to be HNPCC. Tumors from HNPCC patients display an unusual genetic defect in which short, repeated DNA sequences, such as the dinucleotide repeat sequences found in human chromosomal DNA (xe2x80x9cmicrosatellite DNAxe2x80x9d), appear to be unstable. This genomic instability of short, repeated DNA sequences, sometimes called the xe2x80x9cRER+xe2x80x9d phenotype, is also observed in a significant proportion of a wide variety of sporadic tumors, suggesting that many sporadic tumors may have acquired mutations that are similar (or identical) to mutations that are inherited in HNPCC.
Genetic linkage studies have identified two HNPCC loci thought to account for as much as 90% of HNPCC. The loci map to human chromosome 2p15-16 (2p21) and 3p21-23. Subsequent studies have identified human DNA mismatch repair gene hMSH2 as being the gene on chromosome 2p21, in which mutations account for a significant fraction of HNPCC cancers.1, 2, 12 hMSH2 is one of several genes whose normal function is to identify and correct DNA mispairs including those that follow each round of chromosome replication.
The best defined mismatch repair pathway is the E.coli MutHLS pathway that promotes a long-patch (approximately 3 Kb) excision repair reaction which is dependent on the mutH, mutL, mutS and mutU (uvrD) gene products. The MutHLS pathway appears to be the most active mismatch repair pathway in E.coli and is known to both increase the fidelity of DNA replication and to act on recombination intermediates containing mispaired bases. The system has been reconstituted in vitro, and requires the mutH, mutL, mutS and uvrD (helicase II) proteins along with DNA polymerase III holoenzyme, DNA ligase, single-stranded DNA binding protein (SSB) and one of the single-stranded DNA exonucleases, Exo I, Exo VII or RecJ. hMSH2 is homologous to the bacterial mutS gene. A similar pathway in yeast includes the yeast MSH2 gene and two mutL-like genes referred to as PMS1 and MLH1.
With the knowledge that mutations in a human mutS type gene (hMSH2) sometimes cause cancer, and the discovery that HNPCC tumors exhibit microsatellite DNA instability, interest in other DNA mismatch repair genes and gene products, and their possible roles in HNPCC and/or other cancers, has intensified. It is estimated that as many as 1 in 200 individuals carry a mutation in either the hMSH2 gene or other related genes which encode for other proteins in the same DNA mismatch repair pathway.
An important objective of our work has been to identify human genes which are useful for screening and identifying individuals who are at elevated risk of developing cancer. Other objects are: to determine the sequences of exons and flanking intron structures in such genes; to use the structural information to design testing procedures for the purpose of finding and characterizing mutations which result in an absence of or defect in a gene product which confers cancer susceptibility; and to distinguish such mutations from xe2x80x9charmlessxe2x80x9d polymorphic variations. Another object is to use the structural information relating to exon and flanking intron sequences of a cancer-linked gene, to diagnose tumor types and prescribe appropriate therapy. Another object is to use the structural information relating to a cancer-linked gene to identify other related candidate human genes for study.
Based on our knowledge of DNA mismatch repair mechanisms in bacteria and yeast including conservation of mismatch repair genes, we reasoned that human DNA mismatch repair homologs should exist, and that mutations in such homologs affecting protein function, would be likely to cause genetic instability, possibly leading to an increased risk of developing certain forms of human cancer.
We have isolated and sequenced two human genes, hPMS1 and hMLH1 each of which encodes for a protein involved in DNA mismatch repair. hPMS1 and hMLH1 are homologous to mutL genes found in E.coli. Our studies strongly support an association between mutations in DNA mismatch repair genes and susceptibility to HNPCC. Thus, DNA mismatch repair gene sequence information of the present invention, namely, cDNA and genomic structures relating to hMLH1 and hPMS1, make possible a number of useful methods relating to cancer risk determination and diagnosis. The invention also encompasses a large number of nucleotide and protein structures which are useful in such methods.
We mapped the location of hMLH1 to human chromosome 3p21-23. This is a region of the human genome that, based upon family studies, harbors a locus that predisposes individuals to HNPCC. Additionally, we have found a mutation in a conserved region of the hMLH1 cDNA in HNPCC-affected individuals from a Swedish family. The mutation is not found in unaffected individuals from the same family, nor is it a simple polymorphism. We have also found that a homologous mutation in yeast results in a defective DNA mismatch repair protein. We have also found a frameshift mutation in hMLH1 of affected individuals from an English family. Our discovery of a cancer-linked mutations in hMLH1, combined with the gene""s map position which is coincident with a previously identified HNPCC-linked locus, plus the likely role of the hMLH1 gene in mutation avoidance makes the hMLH1 gene a prime candidate for underlying one form of common inherited human cancer, and a prime candidate to screen and identify individuals who have an elevated risk of developing cancer.
hMLH1 has 19 exons and 18 introns. We have determined the location of each of the 18 introns relative to hMLH1 cDNA. We have also determined the structure of all intron/exon boundary regions of hMLH1. Knowledge of the intron/exon boundary structures makes possible efficient screening regimes to locate mutations which negatively affect the structure and function of gene products. Further, we have designed complete sets of oligonucleotide primer pairs which can be used in PCR to amplify individual complete exons together with surrounding intron boundary structures.
We mapped the location of hPMS1 to human chromosome 7. Subsequent studies by others39 have confirmed our prediction that mutations in this gene are linked to HNPCC.
The most immediate use of the present invention will be in screening tests on human individuals who are members of families which exhibit an unusually high frequency of early onset cancer, for example HNPCC. Accordingly, one aspect of the invention comprises a method of diagnosing cancer susceptibility in a subject by detecting a mutation in a mismatch repair gene or gene product in a tissue from the subject, wherein the mutation is indicative of the subject""s susceptibility to cancer. In a preferred embodiment of the invention, the step of detecting comprises detecting a mutation in a human mutL homolog gene, for example, hMLH1 of hPMS1.
The method of diagnosing preferably comprises the steps of 1) amplifying a segment of the mismatch repair gene or gene product from an isolated nucleic acid; 2) comparing the amplified segment with an analogous segment of a wild-type allele of the mismatch repair gene or gene product; and 3) detecting a difference between the amplified segment and the analogous segment, the difference being indicative of a mutation in the mismatch repair gene or gene product which confers cancer susceptibility.
Another aspect of the invention provides methods of determining whether the difference between the amplified segment and the analogous wild-type segment causes an affected phenotype, i.e., does the sequence alteration affect the individual""s ability to repair DNA mispairs.
The method of diagnosing may include the steps of 1) reverse transcribing all or a portion of an RNA copy of a DNA mismatch repair gene; and 2) amplifying a segment of the DNA produced by reverse transcription. An amplifying step in the present invention may comprise: selecting a pair of oligonucleotide primers capable of hybridizing to opposite strands of the mismatch repair gene, in an opposite orientation; and performing a polymerase chain reaction utilizing the oligonucleotide primers such that nucleic acid of the mismatch repair chain intervening between the primers is amplified to become the amplified segment.
In preferred embodiments of the methods summarized above, the DNA mismatch repair gene is hMLH1 or hPMS1. The segment of DNA corresponds to a unique portion of a nucleotide sequence selected from the group consisting of SEQ ID NOS: 6-24. xe2x80x9cFirst stagexe2x80x9d oligonucleotide primers selected from the group consisting of SEQ ID NOS: 44-82 are used in PCR to amplify the DNA segment. The invention also provides a method of using xe2x80x9csecond stagexe2x80x9d nested primers (SEQ ID NOS: 83-122), for use with the first stage primers to allow more specific amplification and conservation of template DNA.
Another aspect of the present invention provides a method of identifying and classifying a DNA mismatch repair defective tumor comprising detecting in a tumor a mutation in a mismatch repair gene or gene product, preferably a mutL homolog (hMLH1 or hPMS1), the mutation being indicative of a defect in a mismatch repair system of the tumor.
The present invention also provides useful nucleotide and protein compositions. One such composition is an isolated nucleotide or protein structure including a segment sequentially corresponding to a unique portion of a human mutL homolog gene or gene product, preferably derived from either hMLH1 or hPMS1.
Other composition aspects of the invention comprise oligonucleotide primers capable of being used together in a polymerase chain reaction to amplify specifically a unique segment of a human mutL homolog gene, preferably hMLH1 or hPMS1.
Another aspect of the present invention provides a probe including a nucleotide sequence capable of binding specifically by Watson/Crick pairing to complementary bases in a portion of a human mutL homolog gene; and a label-moiety attached to the sequence, wherein the label-moiety has a property selected from the group consisting of fluorescent, radioactive and chemiluminescent.
We have also isolated and sequenced mouse MLH1 (mMLH1) and PMS1 (mPMS1) genes. We have used our knowledge of mouse mismatch repair genes to construct animal models for studying cancer. The models will be useful to identify additional oncogenes and to study environmental effects on mutagenesis.
Our knowledge of hMLH1 and hPMS1 gene sequences makes it possible to produce monoclonal and polyclonal antibodies for use in tests that detect the presence or absence of DNA mismatch repair protein in a tumor sample. Protein based testing is receiving significant attention in view of recent research showing that methylation of hMLH1 promoter DNA is the basis for DNA mismatch repair deficiency in some sporadic tumors. In this situation there is usually no detectable mutation in the hMLH1 cDNA. A screen for hMLH1 cDNA mutations would not show any abnormality. However, an immunoassay for hMLH1 protein shows absence of the protein in tumors with inactivation of the hMLH1 gene by mutation or by promotor methylation, and may be the screening test of choice for some applications. The protein structure information has been used to generate monoclonal antibodies that bind specifically to hMLH1 or hPMS1. The antibodies can be conjugated to labels such as fluorescent compounds, and then used as an immunohistochemical stain to detect DNA mismatch repair protein in a tumor sample.
In addition to diagnostic and therapeutic uses for the genes, our knowledge of hMLH1 and hPMS1 can be used to search for other genes of related function which are candidates for playing a role in certain forms of human cancer.