Analysis of genetic population structure of any organism at the molecular level, requires a thorough understanding of the nature and distribution of DNA sequences among its component individuals and populations. Techniques used in such studies have included allele frequency data (Bowcock, 1987), restriction fragment length polymorphisms (Botstein et al., 1980) and by discovering association among loci for mapping both simple (Collins, 1995) and complex diseases (Lander et al 1989). Similarly, a wealth of data have emerged from studies on the maintenance and evolution of DNA sequences in specific areas of the Drosophila genome such as Adh (Krietman 1983), Xdh (Riley et al., 1989) and amylase (Aquadro et al., 1991). Clearly, a detailed analysis of genomic regions and the genealogy of these genomes across populations, species and genera, using a variety of highly innovative techniques toward the construction of high density haplotypes, and sequence analysis of specific regions would yield information not only on the genome diversity among populations, their aggregates and species but also reveal the significance of diversity at the phenotypic level.
The study of haplotype (haploid genotype) diversity has been recognized as an important tool for studying evolutionary lineages among populations (Templeton et al 1987)) as well as establishing associations and linkage or gametic disequilibria among loci. Since 1989, several methodologies for haplotyping individual genetic markers at specified loci have been investigated using strictly molecular means. The concept of linkage disequilibrium, defined as non-random associations of alleles among loci, plays an important role in mapping genes that are valuable in population, anthropological and medical research. In addition, polymorphic short tandem repeat markers (STR) have been employed to obtain informative haplotypes for linkage analysis (Weber & May, 1989; Dubovsky et al., 1995). However, the use of such haplotype systems is compromised by frequent instances of ambiguous linkage phase in a population sample. Although genotyping of pedigrees often allows determination of linkage phase for many populations of medical and anthropological interest, material on informative families is often unavailable or inadequate. Furthermore, robust statistical methods to estimate haplotype frequencies (Excoffier & Satkin, 1995) often mis-identify rare haplotypes and occasionally generate spurious haplotypes (Tishkoff et al., 1996b). Hence, new and accurate molecular methods for generation of haplotypes are urgently needed.
Methods to isolate genes and specific loci can be grouped into the following two broad categories: construction of genomic DNA libraries and the polymerase chain reaction (PCR). First, production and maintenance of genomic libraries is not only labor intensive, but also requires at least three fold over-sampling to meet the odds of recovering the specific locus. Occasional under-representation of specific regions due to variations in the method of construction of libraries, cloning vectors and stochiasticity associated with biological systems, further increases the uncertainty of recovering a well-defined region of interest. Thus, library construction and screening to study molecular genetic diversity of a large number of individuals across several populations and species will become a formidable task.
Alternatively, while PCR methods offer rapid and efficient analysis of specific loci, there is a limitation on the size of the sampled region. Current methods can accommodate up to 35 kb using cloned DNA as template (Barnes, 1994), but only 25 kb from complex genomic DNA (Cheng et al., 1994). Additionally, optimizations of conditions for long range PCR of genomic DNA, coupled with the introduction of sequence errors during amplification, pose serious problems in the comparative analysis of DNA sequence variation of a specific region among individuals and populations. Directly cloning the desired region from native genomic DNA would provide an effective alternative to library construction and PCR.
One object of the present invention relates to methods for generating collections of a single genetic locus from various sources.
Another object of the present invention relates to genome anthologies, that is, collections of a specific locus, including, for example, a gene or group of genes from multiple sources.
Another object of the present invention relates to the generation of genome anthologies from all members of a gene family from one source or from multiple sources.
Yet another object of the present invention relates to the use of such genome anthologies in a method for identifying specific haplotyping targets.
A further object of the present invention relates to novel methods for haplotyping individual genetic markers at specified loci.
Yet another object of the present invention relates to harvesting human DNA variants to generate targets for drug discovery.
A further object of the present invention relates to the use of haplotyping to screen individuals for sensitivities to specific drugs or treatment regimes.
Another object of the invention is the development of molecular haplotyping kits for a several loci distributed throughout the human genome.
Another object of this invention is to collect multiple variants of a single complete gene from different members of a population, in a manner that not only is efficient, but results in a permanent, replicatable, expressible, fully manipulatable, individually identifiable collection of hemizygous entities.
Another object of this invention is to use genetic variation 1) to enhance the efficacy of therapeutics by customizing such genetic variation for specific population groups, 2) to reduce the costs of developing new drugs, and 3) to increase the chances that a new drug will be successful in clinical trials and, therefore, gain FDA approval.