Currently, most innovations in diagnosis and in therapy remain within the framework of morphology (e.g. the study of tumor shapes), physiology (the study of organ function), and chemistry.
With the advent of molecular biology and molecular genetics, medicine and pharmacology have entered the information age. Information technology, which has been so widely applied to the understanding of human intelligence (artificial intelligence, neural networks), telecommunications, and the Internet, should be applicable to the study of the program of life.
Disease used to be understood as the intrusion of foreign agents (e.g., bacteria) that should be deleted, or as a chemical imbalance that should be compensated. In the genomic era, diseases are interpreted as a deficiency of the genetic program to adapt to its environment caused by missing, lost, exaggerated or corrupted genetic information. We are moving towards an age when disease and disease susceptibility will be described and remedied not only in terms of their symptoms (phenotype), but in term of their cause: external agents and genetic malfunction (genotype).
A great deal of effort of the pharmaceutical industry is presently being directed toward detecting the genetic malfunction (diagnosis) and correcting it (cure), using the tools of modern genomic and biotechnology. Correcting a genetic malfunction can occur at the DNA level using gene therapy. The replacement of destroyed tissues due to, e.g., arthrosis, heart disease, or neuro-degeneration, could be achieved be activating natural regeneration processes, following a similar mechanism as that of embryonic development.
Most genes, when activated, yield the production of one or several specific proteins. Acting on proteins is projected to be the domain of modern drug therapy. There are two complementary ways of acting on proteins: (1) the concentration of proteins soluble in serum can be modified by using them directly as drugs; (2) chemical compounds that interact selectively with given proteins can be used as drugs.
It has been estimated that between 10,000 and 15,000 human genes code for soluble proteins. If only a small percentage of these proteins have a therapeutic effect, a considerable number of new medicinal substances based on proteins remain to be found. Presently, approximately 100 proteins are used as medicines.
All of today's drugs that are known to be safe and effective are directed at approximately 500 target molecules. Most drug targets are either enzymes (22%) or receptors (52%). Enzymes are proteins responsible for activating certain chemical reactions (catalysts). Enzyme inhibitors can, for example, halt cell reproduction for purposes of fighting bacterial infection. The inhibition of enzymes is one of the most successful strategies for finding new medicines, one example of which is the use of reverse transcriptase inhibitors to fight infection by the retrovirus of HIV. Receptors can be defined as proteins that form stable bonds with ligands such as hormones or neurotransmitters. Receptors can serve as “docking stations” for toxic substances to selectively poison parasites or tumor cells (chemotherapy). In the pharmacological definition, receptors are stimuli or signal transceivers. Blocking a receptor such as a neurotransmitter receptor, a hormone receptor or an ion channel alters the functioning of the cell. Since the 1950's, many successful drugs which function as receptor blockers have been introduced, including psycho-pharmaceuticals, beta-blockers, calcium antagonists, diuretics, new anesthetics, and anti-inflammatory preparations.
It can be estimated that about one thousand genes are involved in common diseases. The proteins associated with these genes may not be all good drug targets, but among the dozens of proteins that participate in the regulatory pathway, one can assume that at least three to five represent good drug targets. According to this estimate, 3,000 to 5,000 proteins could become the targets of new medicines, which is an order of magnitude greater than what is known today.
With a typical drug development process costing about $300-500 million per drug, providing a better ranking of potential leads is of the utmost importance. With the recent completion of the first draft of the human genome that revealed its 30,000 genes, and with the new microarray and combinatorial chemistry technologies, the quantity and variety of genomics data are growing at a significantly more rapid pace than the informatics capacity to analyze them.
The emphasis of molecular biology is shifting from a hypothesis driven model to a data driven model. Previously, years of intense laboratory research were required to collect data and test hypotheses regarding a single system or pathway and studying the effect of one particular drug. The new data intensive paradigm relies on a combination of proprietary data and data gathered and shared worldwide on tens of thousands of simultaneous miniaturized experiments. Bioinformatics is playing a crucial role in managing and analyzing this data.
While drug development will still follow its traditional path of animal experimentation and clinical trials for the most promising leads, it is expected that the acquisition of data from arraying technology and combinatorial chemistry followed by proper data analysis will considerably accelerate drug discovery and cut down the development cost.
Additionally, completely new areas will develop such as personalized medicine. As is known, a mix of genetic and environmental factors causes diseases. Understanding the relationships between such factors promises to improve considerably disease prevention and yield to significant health care cost savings. With genomic diagnosis, it will also be possible to prescribe a well-targeted drug, adjust the dosage and monitor treatment.
Following the challenge of genome sequencing, it is generally recognized that the two most important bioinformatics challenges are microarray data analysis (with the analysis of tens of thousands of variables) and the construction of decision systems that integrate data analysis from different sources. The essence of the problem of designing good cost-effective diagnosis test or determining good drug targets is to establish a ranking among candidate genes or proteins, the most promising ones coming at the top of the list. To be truly effective, such a ranked list must incorporate knowledge from a great variety of sources, including genomic DNA information, gene expression, protein concentration, and pharmacological and toxicological data. Challenges include: analyzing data sets with few samples but very large numbers of inputs (thousands of gene expression coefficients from only 10-20 patients); using data of poor quality or incomplete data; combining heterogeneous data sets visualizing results; incorporating the assistance of human experts complying with rules and checks for safety requirements satisfying economic constraints (e.g., selecting only one or two best leads to be pursued); in the case of an aid to decision makers, providing justifications of the system's recommendations; and in the case of personalized medicine, making the information easily accessible to the public.
Thus, the need exists for a system capable of analyzing combined data from a number of sources of varying quantity, quality and origin in order to produce useful information.