Many biological functions are accomplished by altering the expression of various genes through transcriptional (e.g. through control of initiation, provision of RNA precursors, RNA processing, etc.) and/or translational control. For example, fundamental biological processes such as cell cycle, cell differentiation and cell death, are often characterized by the variations in the expression levels of groups of genes.
Gene expression is also associated with pathogenesis. For example, the lack of sufficient expression of functional tumor suppressor genes and/or the over expression of oncogene/protooncogenes could lead to tumorgenesis (Marshall, Cell, 64: 313-326 (1991); Weinberg, Science, 254: 1138-1146 (1991), incorporated herein by reference for all purposes). Thus, changes in the expression levels of particular genes (e.g. oncogenes or tumor suppressors) serve as signposts for the presence and progression of various diseases.
The study of gene expression in the art has been generally concentrated on the regulatory regions of the gene of interest and on the relationships among a few genes. A number of transcriptional factors/DNA binding proteins have been identified and a limited number of regulatory pathways have been discovered. However, the expression of a particular gene is frequently regulated by the expression of a large number of other genes. The expression of those regulatory genes may also be under the control of additional genes. This complex regulatory relationship among genes constitutes a genetic network. The function and regulation of a particular gene can be best understood in the context of this genetic network. As the Human Genome Project and commercial genome research progress at a great rate, most, if not all, of the expressed genes will be partially sequenced in the near future. Understanding the functions and regulatory relationships among the large number of genes is becoming a difficult task with traditional tools. Therefore, there is a need in the art to develop a systematic approach to understand the complex regulatory relationships among large numbers of genes.
This invention provides methods, compositions, and apparatus for studying the complex regulatory relationships among genes. In some of its specific applications, this invention provides methods, compositions, and apparatus for detecting mutations of upstream regulatory genes by monitoring the expression of down-stream genes. In some embodiments, gene expression monitoring is used to determine certain functions of a gene by identifying its down-stream regulated genes. Similar embodiments use gene expression to discern the effect of specific mutations of upstream genes. Gene expression is also used to identify upstream regulatory genes in some embodiments. By combining these approaches, this invention can be used to interrogate the genetic regulatory network and to construct a map indicating regulatory relationships.
Specifically, in one aspect of the invention, gene expression monitoring is used to decipher the complex regulatory relationship among genes. In such embodiments, the expression of more than 10 genes, preferably more than 100 genes, more preferably more than 1,000 genes and most preferably more than 5,000 genes are monitored in a large number of samples of cells. In some embodiments, each of the samples has an expression pattern different from that of other samples. In preferred applications a plurality of independent samples are assayed. The expression data can be analyzed to understand the complex relationships among genes. Ultimately, the expression data are analyzed to develop a map describing such complex relationships.
The invention provides methods to obtain biological samples representing a large number of independent states of gene expression. In some embodiments, antisense oligonucleotides or antisense genes are used to block the expression of specific genes. In other embodiments, homozygous, knock- out techniques are used to specifically suppress the expression of genes. In other embodiments transfection of regulatory genes is used to alter the expression profile of a cell. In some additional embodiments, antisense oligonucleotides of random sequence are introduced to cells to block the expression of genes.
In one such embodiment, expression data are analyzed to generate cluster maps indicating a correlation among genes. In some preferred embodiments, such cluster maps are then analyzed using statistical methods to generate a map consisting of regulatory pathways describing the complex relationship among the genes. Many statistical methods are suitable for building such maps. The LISREL method is particularly useful in such application. In some embodiments, the structure of the map is refined as more data become available. Thus, the map is dynamic and updated automatically as new data sets are entered.
Such a gene network map has a wide variety of applications, such as in the fields of diagnostics, drug discovery, gene therapy, and biological research. For example, an investigator interested in a particular gene may consult such a map to find putative upstream and down-stream genes with statistical confidence. The investigator can then focus further research on those genes.
In another aspect of the invention, gene expression monitoring is used to detect potential malfunction of regulatory genes. In some embodiments, the expression of a subset of genes of interest in a diseased tissue is analyzed to obtain a diseased expression pattern. The subset contains at least one or more than 5, 10, 20, 25, 50, 75, 100, 150, 200, 250, 300, 400, 500, 750, 1,000, 1,250, 1500, 3000, 4550, or 6,000 genes of all the known genes. The expression of the same genes in a normal tissue can also be similarly analyzed to generate a normal gene expression pattern. Difference in the expression of genes indicates the abnormality of regulation in the diseased tissue. In some embodiments, a data filter is used to identify those genes whose expression is significantly altered. By using a data filter, only those genes whose expression is enhanced or reduced in the diseased tissue more than, e.g., 3, 5, or 10-fold are identified as altered.
Once the expression of a gene is found to be altered in the diseased tissue, the upstream regulatory gene of the altered gene is indicated as a candidate malfunctioning gene. In some embodiments, a upstream gene is identified as a candidate malfunctioning gene only if the expression of two of more of its down-stream genes is affected. The candidate malfunctioning gene is then sequenced to check whether a mutation is present, or the malfunction is due to epigenetic or nongenetic effectors. In some cases a mutation may not be present in the genome, and yet the product of the regulatory gene appears to be malfunctioning. For example, p53 can be functionally rather than genetically inactivated by binding to viral proteins, such as E1B and large T antigen. By assaying for the ability of a regulatory protein to activate or repress other gene""s expression, the both genetic and phenotypic inactivation can be assessed.
In yet another aspect of the invention, the function of a particular mutation in a regulatory gene can be determined by gene expression monitoring. In some specific embodiments, the expression profiles of cells containing the specific mutation and control cells lacking the mutation are compared to determine whether the mutation affects the expression of down-stream genes. Similarly, the function of a particular gene may be determined. In such embodiments, the expression of a large number of genes is monitored in biological samples with the target gene expression to produce a control expression profile. The expression of the target gene is then suppressed to produce a target expression profile. By comparing the two expression profiles, one can identify potential regulated down-stream genes from affected genes.
In one specific embodiment, p53 activated and repressed genes are monitored to detect loss of wild-type p53 function. In another specific embodiment, gene expression monitoring is used to detect the in-cell function of p53.
According to yet another embodiment loss of function of a nucleic acid encoding a regulatory molecule in a test cell can be determined. A first nucleic acid molecule encoding a regulatory molecule is selected for analysis. A set of second nucleic acid molecules whose expression is induced or repressed by the regulatory molecule in normal cells is compiled or selected. A transcription indicator of a test cell is hybridized to a set of nucleic acid probes. The transcription indicator is selected from the group consisting of mRNA, cDNA and cRNA. Each member of the set of nucleic acid probes comprises a portion of a nucleic acid molecule which is a member of the set of second nucleic acid molecules which are induced or repressed by the selected regulatory molecule. The amount of transcription indicator which hybridizes to each of said set of nucleic acid probes is determined. A test cell is identified as having lost function of the regulatory molecule if (1) hybridization of the transcription indicator of the test cell to a probe which comprises a portion of a nucleic acid which is induced by the regulatory molecule is lower than hybridization using a transcription indicator from a normal cell, or (2) hybridization of the transcription indicator of the test cell to a probe which comprises a portion of a nucleic acid which is repressed by the regulatory molecule is higher than hybridization using a transcription indicator from a normal cell.
Monitoring of gene expression can be used to help elucidate gene function. For example, one can determine what other genes are coordinately expressed with a gene of interest when the gene of interest is genetically altered or subjected to a chemical which affects its expression. Conversely one can determine which upstream genes control expression of the gene of interest by knocking out putative upstream genes and monitoring expression of the gene. Of particular interest are those genes involved in basic biological processes, such as mitosis, mismatch repair, and apoptosis, those genes involved in development, and those genes involved in infections by viruses and other pathogens. Thus gene expression monitoring can be used to discern functions and relationships among genes, as well as to dissect signaling pathways and networks.
Pathological states can be characterized by patterns of gene expression. Thus diseased and matched normal tissues can be monitored and compared to devise a disease xe2x80x9cfingerprintxe2x80x9d. This fingerprint can be used to diagnose other patients once established. Disease states include cancers, dysplasia, viral infections, parasite infections, bacterial infections, etc.
Gene expression monitoring can be used to establish developmental stages, for example, of a differentiating tissue, of an embryo, or of a fetus. A distinctive pattern of gene expression can be established for a particular stage by comparisons to closely and/or distantly related stages. Such patterns can be used for developmental staging of cells or tissues.
Potential chemotherapeutic agents can be screened or evaluated for their ability to alter a distinctive xe2x80x9cfingerprintxe2x80x9d, for example, altering a xe2x80x9cfingerprintxe2x80x9d of a disease state to a xe2x80x9cfingerprintxe2x80x9d of a non-licensed state. Similarly agents can be screened for the ability to alter a differentiation state (fingerprint) of cells or tissues. Gene expression monitoring can be used for any stage of screening and development of therapeutic agents. It can be used for initial identification of lead compounds. It can be used for monitoring purification of compounds. It can be used for monitoring efficacy of modified forms of therapeutic agents or lead compounds. Expression monitoring is useful in any stage of research and development for discovering therapeutic agents.