The physical makeup of an individual is determined by his or her genes. Genes are comprised of DNA, which in turn consists of four nucleotides known as adenine(A), thymine(T), cytosine(C), and guanine(G). A particular series of nucleotides is known as a gene sequence. Each gene sequence codes for a protein. A defective or mutant gene sequence will not produce a working protein. The protein may not perform its purpose, the protein may carry out a different purpose than intended, too much protein may be made, too little protein may be made, or the protein may not be made at all. If the protein is essential to one or more functions of the body, disease will result.
Mutant gene sequences are either inherited or acquired. An inherited gene sequence is received from an individual's parents, while an acquired gene sequence results from an event in the individual's lifetime which changes the original gene sequence.
A classic example of an inherited mutant gene sequence is the sickle cell anemia gene. Sickle cell anemia is caused by the substitution of a single nucleotide (A to T) in the gene sequence of an individual. This single substitution results in the substitution of a single amino acid (glutamic acid to valine) in the resulting hemoglobin protein. The mutant hemoglobin protein produces crescent-shaped or sickled red blood cells in affected individuals, causing a decrease in the amount of oxygen that can be transported throughout the body. The lack of oxygen often results in kidney and heart failure, paralysis, and rheumatism, which are common symptoms of anemic individuals.
An example of an acquired mutant gene sequence is malignant melanoma, or skin cancer. Cancer results when normal cells in an individual's body either lose or gain certain functions, resulting in the unchecked growth of non-normal cells. These non-normal cells often form tumors and spread throughout the body, disrupting normal cell functions. A cancer such as malignant melanoma is caused when the original gene sequence in epidermal cells is changed or mutated by an environmental factor, such as UV radiation. Our cells contain repair mechanisms to fix such problems, but over time the gene sequences in epidermal cells acquire more and more mutations. Mutant proteins are then produced and cellular functions are disrupted. The individual then has skin cancer.
Although an individual's environment generally precipitates the development of cancer, many individuals have been found to have a predisposition to cancer. These individuals have gene sequences which are more likely to become mutated over a shorter period of time. Examples of such gene sequences are the BRCA1 and BRCA2 genes. Women carrying these gene sequences have a higher probability of developing breast and ovarian cancer than women who carry normal gene sequences. Thus, although the affected women's original gene sequences may not be mutated, they are more likely to become mutated due to their sequence or location on a chromosome.
Another factor that should be considered when discussing genetic diseases is whether they are monogenic or polygenic in nature. Sickle cell anemia and cystic fibrosis are examples of monogenic diseases, as they are caused by a single gene sequence. Most types of cancer, asthma, and diabetes are examples of polygenic diseases, as they are caused by a variety of genes. Polygenic diseases are also more likely to be influenced by an individual's environment. Not surprisingly, polygenic diseases are more difficult to diagnose and treat. Thus, the use of gene sequences in developing new drugs is dependent the monogenic or polygenic nature of genetic diseases.
Typically, individuals with diseases caused by inherited or acquired gene sequences have only their symptoms treated. Diabetes patients receive insulin shots to regulate their blood glucose levels, asthma patients use inhalers to allow normal respiratory functions, and cancer patients undergo chemotherapy and radiation therapy to remove cancerous tumors. Although these treatments are often able to alleviate or eliminate the symptoms, they are unable to remove the genetic bases of the diseases.
The genetic bases of many diseases were discovered in the 1940's by scientists such as Beadle and Tatum, who discovered that each gene codes for a protein. Researchers then rationalized that study of the relevant gene sequences could lead to effective drug treatments for genetic diseases. The technology was inadequate, however, until the 1970-80's, when Boyer and Cohen cloned DNA; Maxam, Gilbert, and Sanger figured out how to sequence DNA; and Mullis developed the polymerase chain reaction (PCR) technique to quickly amplify DNA sequences. Using genetics to find drug candidates soon became a practical option.
Before these techniques became available, the pharmaceutical industry's main method of finding new drugs was trial and error. Compounds that were found to mimic the body's natural compounds were tested in vitro, in animal models, and in clinical trials to see if they had a desirable effect in treating disease. This method is still used and has resulted in many well-known drugs, but it is expensive and time-consuming.
With the advent of improved genetic techniques, however, the pharmaceutical industry has begun concentrating on genetics as the most effective route to new drug discovery. Genomics companies can typically be classified into one of two groups.
The first group concentrates on gene sequencing in order to find both drug targets and drug candidates, usually in the form of proteins expressed by the gene sequences. Gene sequencing can either be in the form of random discovery, whereby genes are sequenced without regard to their functions, or in the form of targeted discovery, whereby a certain region of the genome which is tentatively associated with a disease is sequenced. In random discovery gene sequencing, potentially useful gene sequences are identified and assayed to determine if they can be used in drug development. One problem with random discovery gene sequencing is that the majority of the human genome contains introns, or gene sequences which do not code for proteins. One way to circumvent this problem is to sequence complementary DNA (cDNA) instead. cDNA is produced from messenger RNA (mRNA). mRNA, in turn, is transcribed from DNA and processed by certain enzymes which remove the introns. cDNA sequences thus code for un-interrupted proteins.
Targeted discovery gene sequencing is typically used with positional cloning, comparative gene expression, and functional cloning techniques, which are described in the next group.
The second group of genomics companies takes a more epidemiological approach by first researching families or groups of individuals having a similar disease, and then isolating the relevant genes. In this method, also known as positional cloning, blood samples are taken from the individuals and analyzed. The blood samples contain DNA, which is studied to identify certain regions of the genome which appear to be associated with the disease. Linking a region of the genome with a disease is known as linkage analysis or genetic linkage mapping. Once a region of the genome has been identified, it is sequenced via targeted discovery gene sequencing.
The second group of genomics companies also uses comparative gene expression to discover disease gene sequences. In comparative gene expression, mRNA from both healthy and diseased tissue is isolated. The mRNA is then used to produce cDNA, which is sequenced using targeted discovery gene sequencing. The gene sequences from both the healthy and diseased tissue are then compared. In addition, the identification of genes associated with disease can be made by studying the level of expression of genes in both the healthy and diseased tissue.
Another similar technique is functional cloning. Mutant or non-functional proteins in metabolic pathways are studied and identified. The proteins are sequenced using targeted discovery gene sequencing and these sequences are used to figure out the corresponding DNA gene sequences. Once the disease gene sequences have been identified, they can be used in drug development.
Genomics companies in the first group include Incyte Pharmaceuticals (Palo Alto, Calif.). Incyte uses random discovery gene sequencing to produce its LifeSeq™ and LifeSeq FL™ databases. These databases contain the sequences of hundreds of human genes. These databases are licensed to drug development companies who use the sequences to produce new drugs. Databases covering animals (ZooSeq™), plants (PhytoSeq™), and bacteria and fungi (PathoSeq™) are also available. Incyte has also developed bioinformatics software, which provides sequence analysis and data management for their databases. In addition, Incyte offers cDNA libraries of the gene sequences in their databases, which can be directly used in drug development.
Human Genome Sciences (Rockville, Md.) also concentrates on random discovery gene sequencing, and has sequenced an estimated 90% of the 100,000 genes in the human body. In addition to collaborating with drug development companies who use their gene sequences, HGS also has its own drug discovery and development division. A number of therapeutic proteins which appear effective in animal models are under study.
Hyseq, Inc. (Sunnyvale, Calif.) has its HyX Platform which is capable of processing and sequencing millions of blood and DNA samples. The HyX Platform includes DNA arrays of samples and probes, software-driven modules, industrial robots for screening DNA probes against DNA samples, and bioinformatic software to analyze the genetic information. Through the use of its HyX Platform, HyX believes it can carry out a variety of techniques, such as gene identification, gene expression level determination, gene interaction studies (for polygenic diseases), and genetic mapping.
Affymetrix, Inc. (Santa Clara, Calif.) has a GeneChip system consisting of disposable DNA probe arrays containing gene sequences on a chip, instruments to process the probe arrays, and software to analyze and manage the genetic information in the probe. The GeneChip system thus allows pharmaceutical and biotechnology companies to collect gene sequences and apply them to drug development.
On the other hand, the pharmaceutical industry has a number of genomics companies who first identify the genes which are likely to cause disease. After the genes are identified, they are sequenced and the gene sequences are used in drug development. Likewise, proteins implicated in disease can be identified and sequenced. The sequences can be used to discover the gene sequences, which are then used in drug development.
Myriad Genetics, Inc. (Salt Lake City, Utah) targets families with a history of genetic disease and collects their genetic material in order to identify hereditary disease-causing genes. Myriad is able to identify these genes by using positional cloning and protein interaction studies in combination with targeted discovery gene sequencing. Using these techniques, Myriad has been able to locate and identify eight disease-related gene sequences, including BRCA1 and BRCA2. These gene sequences are used by Myriad's pharmaceutical partners to develop new therapeutics.
Another genomics company which uses disease inheritance patterns together with gene sequencing is Sequana (La Jolla, Calif.). Sequana uses DNA collection of individuals with inherited diseases, genotyping and linkage analysis, physical mapping, and gene sequencing to find disease gene sequences. Sequana also has a proprietary bioinformatics system which includes data mining tools to automatically sort and organize much of its data. Like Myriad, Sequana has a number of alliances with drug development companies which license Sequana's gene sequences.
Millennium Pharmaceuticals, Inc. (Cambridge, Mass.) employs a broader range of technologies than Myriad and Sequana. In addition to positional cloning and targeted discovery gene sequencing, Millennium uses a number of other non-genetic techniques. cDNA libraries are prepared from mouse tissues and expressed using rapid expression of differential gene expression (RARE) technology. Different patterns of cDNA gene expression allow researchers to identify possible disease targets. Millennium also uses functional cloning techniques in order to identify the gene sequences of interesting proteins. Once a potentially useful gene sequence has been identified, biological assays and bioinformatics are used as additional analyses.
Genome Therapeutics Corporation (Waltham, Mass.) uses a combination of positional cloning techniques and targeted discovery gene sequencing, as well as random discovery gene sequencing to isolate and identify disease gene sequences. In addition, Genome Therapeutics also has pathogen programs, which sequence pathogen genomes. As many non-genetic human diseases result from infection by pathogens, Genome Therapeutics hopes to eliminate pathogens by developing drugs and vaccines using the pathogens' genomes.
Gene Logic, Inc. (Columbia, Md.) has an accelerated drug discovery system which emphasizes its restriction enzyme analysis of differentially expressed sequences (READS) technology. READS is similar in nature to comparative gene expression technology. In READS, normal and diseased tissues are compared in order to identify gene expression differences between the two. Genes which appear to be important in the diseased tissue are then analyzed. Restriction enzymes, which cut gene sequences at specific sites, are used to produce gene fragments. The gene fragments from the normal and diseased tissues will differ and can be compared. Gene Logic also has a Flow-thru Chip and genomic databases, which it licenses to drug development companies.
Progenitor (Columbus, Ohio) focuses on developmental biology. Growing cells and tissues are analyzed for their level of expression of certain genes. Study of growing cells and tissues may help discover treatments for diseases characterized by abnormal cell growth, such as cancer and osteoporosis. Progenitor also uses bioinformatics, gene mapping, and gene sequencing to isolate, identify, and sequence relevant gene sequences.
OncorMed, Inc. (Gaithersburg, Md.) has focused on the development of medical services using genetic information. Oncormed offers a number of tests for hereditary diseases such as breast and colon cancers and malignant melanoma. The medical services include measurements of replication error rates in tumors, molecular profiling of tumor suppresser genes, and gene sequencing. In addition, OncorMed has a genomics repository containing known cancer gene sequences.
U.S. Pat. No. 5,642,936 issued to Evans and assigned to OncorMed describes a method for identifying human hereditary disease patterns. According to the method, data is collected on individuals having a history of disease within their families. Factors related to each disease are given weights, and the weights for each individual are summed. If the sum is above a certain predetermined threshold value, the individual is deemed to have a hereditary risk for the disease. Records from a number of individuals having a hereditary risk for a disease are collected to form a database.
The methods used by the above companies all focus on the genetic aspect of hereditary disease. Gene sequencing and positional cloning represent the two approaches generally taken. However, very little emphasis is put on the environmental aspect of hereditary disease. An individual's environment is defined as his or her physical surroundings, geographical location, diet, lifestyle, etc. For many diseases which are genetic in origin, such as most cancers, an individual's environment plays a large role in determining whether or not the individual eventually develops the disease. Some individuals who have disease gene sequences develop diseases, while others who carry the exact same disease gene sequences do not. One purpose of collecting environmental data about individuals whose gene sequences are studied is to effectively rule out any non-genetic causes of disease. Another purpose is to discover if any individuals who are carrying disease gene sequences but who do not develop the disease have other compensatory gene sequences or factors which enable them to live disease-free.
To a certain extent, the second group of genomics companies do take into account a small amount of environmental data when they select individuals whose DNA they use for positional cloning analyses. The environmental data is usually in the form of a questionnaire or survey. However, the data is typically limited in scope to lifestyle questions, and is used only to help narrow the search for the specific disease gene in question.
In addition, most genomics companies are reluctant to share their data on individuals' with others, even those genomics companies which are studying the same gene sequences. As a result, each genomics company must gather its own data on individuals having a certain disease. For example, Sequana sent its own researcher to the island of Tristan de Cunha to study hereditary asthma, while Myriad is located in Salt Lake City to take advantage of the detailed family trees of the Mormons. For genomics companies searching for gene sequences, gathering environmental data on individuals is often an expensive, time-consuming, but necessary step. Genomics companies could potentially spend more of their time and money on actual disease gene isolation if they were able to obtain necessary environmental data from another source.
Another problem lies in the fact that when genomics companies do gather environmental data on the individuals whose gene sequences are studied, the environmental data represents only a small time frame of an individual's life. Few genomics companies continually collect data over a long period of time, and as a result, are not able to definitively rule out certain environmental factors which may affect disease progression. In addition, such data collections are unlikely to provide leads for factors which may prohibit the formation of disease.