A genome map describes the order of genes or other markers and the spacing between them on each chromosome. Human genome maps are constructed on several different scales or levels of resolution. At the coarsest resolution are genetic linkage maps, which depict the relative chromosomal locations of DNA markers (genes and other identifiable DNA sequences) by their patterns of inheritance. A genetic linkage map shows the relative locations of specific DNA markers along the chromosome. Any inherited physical or molecular characteristic that differs among individuals and is detectable in the laboratory is a potential genetic marker. Markers can be expressed DNA regions (genes) or DNA segments that have no known coding function but whose inheritance pattern can be followed. DNA sequence differences are especially useful markers because they are plentiful and easy to characterize precisely.
Markers must be polymorphic to be useful in mapping; that is, alternative forms must exist among individuals so that they are detectable among different members in family studies. Polymorphisms are variations in DNA sequence that occur on average once every 300 to 500 bp. Variations within exon sequences can lead to observable changes, such as differences in eye color, blood type, and disease susceptibility. Most variations occur within introns and have little or no effect on the appearance or function of an organism, yet they are detectable at the DNA level and can be used as markers. Examples of these types of markers include (1) restriction fragment length polymorphisms (RFLPs), which reflect sequence variations in DNA sites that can be cleaved by DNA restriction enzymes, and (2) variable number of tandem repeat sequences, which are short repeated sequences that vary in the number of repeated units and, therefore, in length (a characteristic that is easily measured). The human genetic linkage map is constructed by observing how frequently two markers are inherited together.
Two markers located near each other on the same chromosome will tend to be passed together from parent to child. During the normal production of sperm and egg cells, DNA strands occasionally break and rejoin in different places on the same chromosome or on the other copy of the same chromosome (i.e., the homologous chromosome). This process (meiotic recombination) can result in the separation of two markers originally on the same chromosome. The closer the markers are to each other the more tightly linked the less likely a recombination event will fall between and separate them. Recombination frequency thus provides an estimate of the distance between two markers.
The value of the genetic map is that an inherited disease can be located on the map by following the inheritance of a DNA marker present in affected individuals (but absent in unaffected individuals), even though the molecular basis of the disease may not yet be understood nor the responsible gene identified. Genetic maps have been used to find the exact chromosomal location of several important disease genes, including cystic fibrosis, sickle cell disease, Tay-Sachs disease, fragile X syndrome, and myotonic dystrophy.
Current approaches to identifying genes influencing genic functions within a genome have two common characteristics: firstly, non-coding sequence-variant markers are employed to reveal chromosomal regions containing candidate genes; and secondly, genome-wide analyses seek associations between sequence-variant markers and a phenotype of interest. Gene discovery by genome-wide association analysis, also known as linkage disequilibrium mapping, is the subject of U.S. Pat. No. 5,851,762.
While genomic information is useful in linkage studies, information on the haplome is considered more useful for identifying markers of genomic regions of interest in defining disease-associated gene function. The use of haplomic sequences reduces error in linkage studies because there is no need to consider the involvement of second copy of the gene (as provided on the homologous chromosome) with the trait under consideration.
In 2003, a National Institutes of Health-funded international consortium commenced a three year, US$100,000,000 project to create a human genome haplotype map (termed the “HapMap”) in order to facilitate gene discovery by haplotype-based, marker allele-disease gene association in complex diseases. This strategy of linkage disequilibrium allele-association mapping (‘association mapping’) involves unrelated, individual patients, and is the more recent alternative to the traditional approach of family (pedigree) linkage analysis when families are not available.
An assumption upon which HapMap is based is that haplotype inference from genotyping diploid DNA is sufficiently resolving for association mapping to warrant continuation as the strategy for genome-wide gene discovery.
A further assumption is that analysis of SNPs (single nucleotide polymorphisms) of a few hundred individuals from 4 populations (West African Nigerians, Japanese, Chinese, American) will be adequate to identify redundant SNPs, and thereby to identify haplotype-marking single SNPs, or minimum sets of SNPs, sufficient to characterize haplotype blocks in all, including admixed, populations. The HapMap consortium also proposes that only common haplotypes (>5-10%) will be important in common, multigenic diseases and in drug reactivity, and that these common haplotypes will be identifiable in a study of around 200 individuals.
A further assumption is that that the haplome is organized into discrete “blocks” with each block being identifiable with a unique SNP “tag”. It is currently accepted in the field of genetics that use of the minimum essential SNPs revealed by the HapMap project will identify sufficient common haplotypes in any population for detection of excess haplotype sharing in disease-gene searches and drug-affective pharmacogenomics.
Thus, the present state of the art is that the HapMap will be definitive, and capable of providing more than sufficient information for linkage studies. However, closer inspection of the HapMap project suggests that the project will achieve only limited information at best, and may be fundamentally flawed in its application to multi-genic disease discovery. For example, SNP identification will detect only a proportion of extant haplotypes, perhaps not even all commonly occurring haplotypes. Uncommon haplotypes (that may not be detectable in the HapMap) may also contribute to genic functional differences between individuals.
The assumed block structure of the haplome may also lead to errors. Recombinations and other rearrangements can be expected to affect haplotype block structure in admixed populations that may not be revealed by a limited analysis of ‘core’ populations.
The resolving power of inferred haplotypes can be expected to be challenged where two or more genes, having different modes of inheritance (recessive, dominant, co-dominant), differing functions (disease-predisposing, disease-protective), acting at different stages of disease progression, occur within a single chromosomal region. Resolving power will be most challenged when risk is contributed by both chromosomes as in compound heterozygous recessive diseases, and where trans as well as cis co-dominant interactions occur with co-dominant inherited genes such as those of the HLA complex. The significance of these doubts has not been recognized in the art.
A critical test of the utility of haplotyping association mapping is the ability to identify genic regions of interest already identified by pedigree linkage analyses. In an important test case, association mapping failed to identify the 6p21.3 (HLA) region of genetic risk (RR: lower bound 20-upper bound infinity) in nasopharyngeal carcinoma identified by haplotype sharing linkage analysis. This points to another problem in the art: non-coding based strategies have insufficient resolving power to detect even the strongest genetic association with any common human cancer.
Many diseases are known or suspected to be multigenic. Indeed, it is thought that most diseases are multigenic, and that monogenic diseases are the exception. The identification of genes with involvement in multigenic diseases is complicated in the methods of the prior art due to the patterns of inheritance of the genes from the maternal and paternal genotypes. Thus, while the prior art methods of mapping and gene discovery have been useful in identifying genes having simple modes of inheritance and simple involvement in disease, there remains a clear need for more powerful methods to unravel gene involvement in complex diseases.
It is advantageous to provide alternative methods to investigate the haploid state of a cell. Where a given method is unsuitable for reasons of economy, ease of use, accuracy, availability of equipment or any other reason, an alternative method is available.
Accordingly, it is an aspect of the present invention to overcome or at least alleviate a problem of the prior art. In particular, the present invention aims to provide alternative methods for more accurately mapping a gene using haploid information
The discussion of documents, acts, materials, devices, articles and the like is included in this specification solely for the purpose of providing a context for the present invention. It is not suggested or represented that any or all of these matters formed part of the prior art base or were common general knowledge in the field relevant to the present invention as it existed before the priority date of each claim of this application.