Chromosome aberration is a hallmark of cancer, which provides valuable clues to pinpoint the candidates for cancer-related genes. Cancer-specific chromosome aberrations often result in alterations in the structure and/or the dosage of cancer-causing genes [Balmain, et al. Nat. Genet. 33 Suppl: 238-244 (2003)]. Examples include Bcr/Abl fusion gene caused by the translocation between chromosomes 9 and 22 (Philadelphia chromosome) in chronic myeloid leukemia (CML), and loss of a copy of the Rb gene caused by the deletion of the proximal region of the long arm of chromosome 13 in retinoblastoma. Chromosome rearrangement in cancers may also result in alterations in other genes, which may not be the primary cause of cancer, but have an impact on cancer susceptibility and/or cancer progression, and thus have a great value in cancer risk assessment, diagnosis and prognosis. Therefore, genes and their alterations involved in all clonal and recurrent chromosome abnormalities should be carefully characterized.
Most cancer chromosome abnormalities are detected through cancer cytogenetics studies. Once a clonal chromosome structure abnormality is detected, researchers may further define the genomic region(s) involved in the abnormality by FISH or other DNA hybridization-based techniques to “walk through” the abnormality with various cloned genomic sequences. The gene content in the defined region(s) can be disclosed by matching the genes that have been mapped in the regions from the human genome databases. This approach can narrow down the candidates for cancer-related genes, from which the actual cancer-causing and cancer-risk genes may be identified by genotype-phenotype correlation and functional studies. Such a strategy is referred to as cytogenetics-based positional cloning and is so far the most successful strategy for identifying cancer-related genes, particularly from hematological cancers. However, this strategy has limitations.
First, it is heavily relied on in standard chromosome studies and FISH to define the genomic regions that are involved in cancer chromosome aberrations. Due to limited resolution and often poor cancer chromosome morphology, determining the genomic origins of cancer chromosome abnormalities can be particularly difficult. It is often further complicated when the aberrations are only detected in a small number of cancer cells that are mixed with a large population of cells with an apparently normal karyotype. Because of this limitation, the genomic origins of many observed cancer chromosome abnormalities cannot be identified for further analysis.
Second, the cytogenetics-based positional cloning is not a straightforward, high-throughput strategy; the analytical process of this strategy is cumbersome and labor-intensive. It may take several months for a skillful researcher to fully analyze one abnormality to determine the genomic regions, breakpoints and gene content involved in the abnormality. In addition, such an analysis usually requires a significant amount of cancer specimens, but that are not always available. Therefore, only a very small fraction of observed cancer chromosome abnormalities have been thoroughly characterized for positional cloning. Linkage analysis can also be used to reveal candidate loci for positional cloning. However, this method is more commonly used for studying constitutional Mendelian diseases and rare familial cancer cases; its application seems to be limited in genetic analysis of sporadic cancers, perhaps due to the complexity of the cancer genetics and genomics.
Genome-wide screening for genomic DNA copy number imbalance is another strategy for cancer genetic analysis, including restriction landmark genome scanning [Hayashizaki, et al., Electrophoresis 14: 251-258 m(1993)], comparative genomic hybridization (CGH) [Kallioniemi, et al. Science 258: 818-821 (1992)], high-throughput quantitative PCR [Ginzinger, et al. Cancer Res. 60: 5405-5409 (2000)] and molecular subtraction techniques, such as representational display analysis [Lisitsyn, et al. Science 259: 946-951 (1993)]. These global screening techniques, particularly the array-based CGH [Albertson, et al., Nat. Genet. 25: 144-146 (2000)], are powerful tools to detect genome-wide DNA sequence dosage change, including deletion and duplication, in the cancer genome. However, these techniques also have limitations. They usually cannot detect balanced chromosomal rearrangements that often result in cancer-causing gene fusion and/or breaking apart, such as the Bcr/Abl fusion gene in CML and the MLL gene split in various types of leukemia. In addition, the capability of these techniques to detect DNA copy number imbalance can be complicated or impaired in mixed cell populations. Unfortunately, most cancer specimens are contaminated with more or less normal cells, and genomic changes in most cancers are heterogeneous; cancer tissues are often mixed with multiple cell lines with multiple clonal and/or non-clonal (random) genomic abnormalities. Furthermore, these techniques may not directly reveal the dosage of individual genes involved in the unbalanced genomic regions. For example, the resolution of the current array CGH is about a hundred thousand base pairs of genomic DNA [Albertson, et al., Nat. Genet. 25: 144-146 (2000); Vissers, et al., Am. J: Hum. Genet. 73: 1261-1270 (2003).], which is better than that of the cytogenetic analysis, but is not enough to determine the dosage alterations in individual genes. Array-based gene expression analysis is also a powerful tool for cancer studies, which reveals expression patterns of all known or predicted genes in cancers [Hanash, S. Nat. Rev. Cancer 4: 638-644 (2004)]. The expression level of each gene can be measured and genes with similar expression levels or with similar functions can be grouped for further analysis. Since many genes may show a similar expression level and the abnormal expression may or may not represent the primary genetic change in cancer, the array expression analysis is an excellent tool for cancer biology studies rather than identification of cancer-causing genes and primary genetic changes. From a technical point of view, the best way to characterize a chromosome rearrangement is directly analyzing the genomic DNA from the abnormal chromosome, which increases the efficiency and accuracy of detecting the genomic content, gene content and possible mutations involved in the regions. As of yet, however, most current technical strategies do not have the ability to directly characterize detectable chromosome aberrations.
In addition, none of the techniques described above can directly uncover the genomic features, such as DNA sequence variations, of the abnormal chromosome regions. Such information is an important part of molecular profile of cancer, which will facilitate cancer epidemiology studies, risk assessment, diagnosis and prognosis. Genomic sequence variation or polymorphism is an important feature of the genome. The most common type of variation is single nucleotide polymorphism (SNP) that is defined as a single nucleotide variation at a locus with the frequency of the minor allele greater than 1% in at least one population [Risch, N. J. Nature 405: 847-856 (2000).]. It is estimated that the human genome contains more than 15 million SNPs [Botstein, et al. Nat. Genet. 33 Suppl: 228-237 (2003)]. These polymorphisms are valuable markers for genetic association studies, because they are frequently linked with disease-related genes or traits. It is apparent that SNPs are not inherited randomly in the same chromosome; instead, they are often inherited as phased combinations of specific alleles in particular populations. Therefore, analyzing phased SNPs or SNP haplotypes provides a more informative approach to study genetic associations.
In general, haplotype is a combination of linked polymorphic alleles on a single chromosome. A given homologous chromosome pair in the diploid genome has two haplotypes, representing maternal and paternal origins [The International HapMap Consortium. Nature 426, 789-796 (2003)]. Haplotypes of the human genome appear to be organized as discrete blocks with an average size of 9-18 kb in length (ranging from less than 1 kb to more than 170 kb) due to linkage disequilibrium (LD) [Gabriel, S. B. et al. Science 296, 2225-2229 (2002)]. Linked polymorphic alleles within each block tend to act as a single multi-site allele with limited haplotype diversity [Wall, J. D. & Pritchard J. K. Nat. Rev. Genet. 4, 587-597 (2003)]. The haplotype blocks represent the evolution, inheritance and recombination histories of the genome. Thus, analyzing haplotype blocks of highly condensed polymorphic markers, such as SNPs, provides a powerful tool for genetic association studies [Bostein, D. & Risch, N. Nat. Genet. Suppl. 33, 228-237 (2003); Crawford, D. C. et al. Am. J. Hum. Genet. 74, 610-622 (2004); Drysdale, C. M. et al. Proc. Natl. Acad. Sci. USA 97, 10483-10488 (2000)]. Uncovering SNP haplotypes of the abnormal chromosome regions in cancer should also be of great help for tracing the origins of the abnormal alleles, following-up the progression of the abnormalities, identifying low-penetrance cancer-related traits, and studying drug response and prognosis.
However, unambiguously determining haplotypes of SNP blocks, especially large blocks, is particularly challenging due to technical limitations. There have been two broad categories of tools for unambiguous haplotyping: genotyping family pedigrees and directly genotyping SNPs on a single chromosome of interest [Crawford, D. C. & Nickerson, D. A. Annu. Rev. Med. 56, 303-320 (2005)]. The former is expensive, time-consuming and requires DNA samples from several generations, which are not always available. In addition, accurately assigning SNP phase using family-based methods becomes increasingly difficult as more loci are considered. Meanwhile, the latter currently relies on enrichment of DNA of single-chromosome origin using complicated methods, such as somatic cell hybrid and multi-step allele-specific PCR, which is also time-consuming and expensive. These limitations make it difficult to apply unambiguous SNP haplotype analysis to either individual- or population-based genetic association studies.
Taken together, effectively identifying disease-causing and disease-risk genes as well as their genomic alterations/variations, particularly from cancer-associated clonal chromosome aberrations and constitutional mosaicism from mixed cell populations, remains a great challenge in genetic research.