Biotechnology/biopharmaceutical companies have found significant commercial success in business methods wherein a biotechnology company partners with a large pharmaceutical company in pursuit of a particular scientific discovery. For example, it is common for biotechnology companies to engage in various discovery processes (e.g. drug “target” discovery processes) whereby they retain downstream intellectual property rights and/or royalty streams. It is also common for biopharmaceutical companies to collaborate with pharmaceutical companies for purposes of drug discovery, wherein the biopharmaceutical companies use one of several methods to identify regions of the genome that play a role in a particular disease.
The progress of a drug from the point it is discovered in the laboratory to its launch in the marketplace, if successful, is referred to as the “drug development pipeline.” On average, such a process takes 10-15 years and about $359 million, and it is estimated that only about one in up to several thousand compounds that enter preclinical testing ultimately makes it to the market as a pharmaceutical. In the U.S. a drug pipeline generally comprises eight stages. The first is discovery, in which a candidate compound is synthesized, isolated, and characterized. Next, biological testing is performed as an initial screening for potential activity, toxicity and stability. Preclinical or animal testing follows and includes an extensive series of in vivo and in vitro studies to evaluate safety and biological activity against the targeted phenotype (e.g., disease, susceptibility, etc.) Once it is determined that human trials are warranted, an IND (Investigational New Drug application) filing is submitted to the Food and Drug Administration (FDA). If approved, a Phase I study is performed in which the drug is administered to a small number (˜20-80) of healthy volunteers to determine safe dosage ranges, absorption and metabolism of the compound. Following a successful Phase I study, a Phase II study is performed to evaluate the efficacy and adverse events in approximately 100-300 volunteers with the targeted phenotype. Finally, a Phase III study is performed to further evaluate the efficacy and long-term adverse events in approximately 1,000-3,000 volunteers with the targeted phenotype. If the compound “passes” the Phase III study, then preregistration of the compound takes place through the filing of an NDA (New Drug Application) with the FDA. The FDA reviews the research findings and other scientific information contained within the NDA and determines whether or not to approve the drug, and at what dosage and for what specific indication(s) it is to be used, thereby “registering” the drug. The final stage in the drug pipeline is the post-marketing stage in which any adverse reactions or quality control issues are reported to the FDA. The FDA may also require Phase IV studies. Even at this stage, the drug may be recalled or withdrawn from the market. Only about one out of every five drugs that enters clinical trials is approved for patient use. Failure at the end of a clinical trial could nullify many years of work and millions of dollars spent by a research institution or pharmaceutical company.
The DNA that makes up human chromosomes provides the instructions that direct the production of all proteins in the body. These proteins carry out vital functions of life. Variations in DNA are directly related to almost all human diseases, including infectious diseases, cancers, inherited disorders, and autoimmune disorders. Variations in DNA contributing to a phenotypic change, such as a disease or a disorder, may be, e.g., a single variation that disrupts the complex interactions of several genes, any number of mutations within a single gene, any number of mutations in a plurality of loci in the genome, or a combination thereof. For example, Type I and II diabetes have been linked to multiple genes, each with its own pattern of mutations. In contrast, cystic fibrosis can be caused by any one of over 300 different mutations in a single gene. Phenotypic changes may also result from variations in non-coding regions of the genome. For example, a single nucleotide variation in a regulatory region can upregulate or downregulate gene expression or otherwise alter gene activity.
Recent technological developments in the field of human genomics have enabled the development of pharmacogenomics, the use of human DNA sequence variability in the development and prescription of drugs. Pharmacogenomics is based on the correlation or association between a given genotype and a resulting phenotype. Since the first correlation study over half-a-century ago linking adverse drug response with amino acid variations in two drug-metabolizing enzymes (plasma cholinesterase and glucose-6-phosphate dehydrogenase), other correlation studies have linked sequence polymorphisms within drug metabolism enzymes, drug targets and drug transporters with compromised levels of drug efficacy or safety.
Pharmacogenomic data is especially useful in clinical settings where correlation information is used to prevent drug toxicities. For example, patients are often screened for genetic differences in the thiopurine methyltransferase gene that cause decreased metabolism of 6-mercaptopurine or azathiopurine. However, only a small percentage of observed drug toxicities have been explained adequately by the set of pharmacogenomic markers available to date. In addition, “outlier” individuals, or individuals experiencing unanticipated effects in clinical trials (when administered drugs that have previously been demonstrated to be both safe and efficacious), cause substantial delays in obtaining FDA drug approval and may even cause certain drugs to come off market, though such drugs may be efficacious for a majority of recipients.
The various biotechnological methods used to date to identify target genomic regions include, for example, differential gene expression which essentially looks for differences in gene expression between control and case samples; protein-protein interaction maps which are used to identify drug receptors and their immediate effectors; and mining human sequence databases for sequences similar to known disease-related, pharmacokinetic or pharmacodynamic regulators. In comparison, association studies that correlate and validate genomic regions with a particular phenotypic trait rely on population genetics and robust statistical metrics. Association studies provide a powerful tool to obtain greater amounts of information in a shorter amount of time thus reducing costs of research and development efforts.
Because all humans are 99.9% identical in their genetic makeup, the DNA sequence of any two individuals is nearly identical. Variations between individuals include, for example, deletions or insertions of DNA sequences, variations in the number of repetitive DNA elements in non-coding regions and changes in a single nitrogenous base position, or “single nucleotide polymorphisms” (SNP). It is estimated that there are ˜7 million common SNPs that have a minor allele frequency of at least 0.1 (i.e., the minor allele of the SNP occurs in at least 10 percent of people). These common SNPs do not occur independently but are inherited from generation to generation in tandem with other SNPs, forming patterns across the genome. Such groups of SNPs (including the genomic region in which they lie) are referred to as SNP haplotype blocks, herein.
Common SNPs are useful for conducting whole-genome association studies. Whole genomes of individuals with and without a phenotypic trait of interest (e.g., resistance to a disease, toxicity from a drug) are scanned and a correlation (or “association”) is made between the SNPs and the phenotypic trait. Such whole-genome analyses provide a fine degree of genetic mapping and can pinpoint specific regions of linkage. Methods for whole genome analysis are described in, e.g., U.S. Ser. No. 60/327,006, filed Oct. 5, 2001, entitled “Identifying Human SNP Haplotypes, Informative SNPs and Uses Thereof,” assigned to the assignee of the present invention and U.S. Ser. No. 10/106,097, filed Mar. 26, 2002, now U.S. Pat. No. 6,969,586, entitled “Methods For Genomic Analysis” , both incorporated herein by reference for all purposes. Further, the identity of SNPs and SNP haplotype blocks across one representative chromosome, e.g. Chromosome 21, are disclosed in U.S. Ser. No., 60/323,059, filed Sep. 18, 2001, entitled “Human Genomic Polymorphisms” assigned to the assignee of the present invention and U.S. Ser. No. 10/284,444, filed Oct. 31, 2002, entitled “Human Genomic Polymorphisms”, incorporated herein by reference for all purposes. See also Patil, N. et al, “Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21” Science 294, 1719-1723 (2001), disclosing SNPs and haplotype structure of Chromosome 21.
Genomic studies may be used to discover genetic variation between different organisms, for example, bacterial strains. This knowledge could be valuable in identifying the source of an outbreak, whether natural or as a result of bioterrorism. When a bioterror attack occurs, it is important to quickly identify the agent, find its source and apprehend the perpetrators. Similarly, prompt identification of the source of natural, food-borne disease outbreaks can limit the number of affected individuals, thereby saving lives. As such, tools are needed to rapidly and uniquely identify different organism's, such as bacterial strains.