The sequencing of the S. cerevisiae genome marked the first complete, ordered set of genes from a eukaryotic organism, and revealed the presence of over 6,000 genes on 16 chromosomes (Mewes et al., 1997, Goffeau et al., 1996). The DNA sequence revealed the presence of 6275 known and hypothetical open reading frames (ORFs) encoding putative proteins longer than 99 amino acids in length. Based upon codon usage, which can serve as a predictor of whether or not an ORF is actually expressed, there are currently thought to be 6222 expressed ORFs (Cherry et al., 1997).
The sequence of the roughly 6,000 ORFs in the yeast genome is compiled in the Saccharomyces Genome Database (SGD). The SGD provides Internet access to the complete genomic sequence of S. cerevisiae, ORFs, and the putative polypeptides encoded by these ORFs. The SGD can be accessed via the World Wide Web at Stanford's website for Saccharomyces and at the Munich Information Center for Protein Sequences, Saccharomyces cerevisiae group. A gazetteer and genetic and physical maps of S. cerevisiae is found in Mewes et al., 1997 (incorporated herein by reference). References therein also contain the sequence of each chromosome of S. cerevisiae (incorporated herein by reference).
Having the complete DNA sequence of yeast available creates an opportunity to take a collectivist, rather than a reductionist, view on biology. We have developed a new technology that enables the simultaneous measurement of gene expression across an entire genome. The GENOME REPORTER MATRIX™ system (GRM) is a matrix of units comprising living yeast cells, the cells in each unit containing one yeast reporter fusion (GRM construct) representative of essentially every known hypothetical ORF of S. cerevisiae. See U.S. Pat. Nos. 5,569,588 and 5,777,888. A GRM construct comprises the promoter, 5′ upstream untranslated region and generally, the first four amino acids from one of each hypothetical ORF fused to a gene encoding an easily assayed reporter, such as green fluorescent protein (GFP), luciferin, or β-galactosidase. The GRM constructs are able to reveal changes in transcription for each hypothetical in response to specific stimuli. In addition, the GRM constructs are able to reveal changes in mRNA splicing, translation and protein stability in those cases in which the N-terminus of the protein is sufficient for regulation.
The GRM provides an unprecedented view into the compensatory changes a cell makes in the face of a changing environment. Such environmental changes may be in the form of pH, salinity, temperature, osmotic pressure, nutrient availability, as well as biochemical perturbations caused by xenobiotics, pharmaceutical compounds and mutation. Identifying the compensatory changes a cell makes in response to exposure to a chemical can provide insight into the biological target of the chemical. For example, treatment of the GRM with the cholesterol-lowering drug lovastatin causes the cells to become depleted for sterols and non-sterol isoprenoids. The yeast cells respond by significantly up-regulating the genes encoding sterol biosynthetic enzymes and thus synthesizing more of the enzymes that make sterols. One may identify those genes that are involved in sterol biosynthesis or in related metabolic pathways by assaying the GRM. Because natural selection operates on a selected outcome rather than on a particular molecular mechanism, gene expression profiling strategies that detect regulatory changes through several molecular mechanisms contribute to a fuller view of how regulatory circuits have evolved.
An understanding of the regulatory circuits of yeast serves two purposes. On the one hand, yeast is an ideal model system for eukaryotic cells, including mammalian cells. Therefore, an understanding of the metabolic pathways of yeast can be used to design or discover drugs for use in plants and animals, including humans. On the other hand, yeast possess certain metabolic pathways and genes which are unique to yeast. An understanding of the differences between yeast and higher eukaryotes will permit the design and discovery of antifungal drugs that target genes and metabolic pathways specific to yeast. See U.S. Ser. No. 60/127,272, filed concurrently herewith.
Yeast cells are eukaryotic and have many pathways that are similar or identical to those of mammalian cells. However, because yeast cells are unicellular, they are easier to manipulate experimentally and the results of such manipulations are easier to determine. Thus, yeast serves as an ideal model system for eukaryotic cells, including mammalian cells. The deduced protein sequences of the yeast genome display a significant amount of sequence identity with mammalian proteins. About one-third of the yeast ORFs, when aligned with their mammalian counterparts, produce a P-value score of less than 1×10−10 (Botstein et al., 1997). This number may in fact be a significant underestimate because the alignments were done with GENBANK® entries that make up only about 10–20% of the unique human protein sequences thought to exist.
The evolutionary conservation between yeast and humans is not limited to sequence identity. The list of human genes that can functionally substitute for their yeast counterparts is extensive. For example, H-Ras (Kataoka et al., 1985), HMG-CoA reductase (Basson et al., 1988) and the heme A:farnesyltransferase (Glerum and Tzagoloff, 1994) have been shown to functionally replace their yeast counterparts. Researchers have utilized this evolutionary conservation to clone mammalian genes through their ability to complement the corresponding yeast mutants. Two examples include CDC2 (Lee and Nurse, 1987) and CDK2 (Elledge and Spottswood, 1991).
Functional conservation between yeast and humans may be best illustrated by the notable lack of antifungal therapeutic agents available for safely treating systemic infections in humans. Antifungal agents certainly exist, but they are characterized by profound side effects likely caused by inhibition of the mammalian counterparts of the yeast target. L659,699, lovastatin, and zaragozic acid inhibit different steps in the yeast sterol pathway (G-COA synthase, HMG-CoA reductase, and squalene synthase, respectively). These inhibitors are also potent inhibitors of the corresponding mammalian enzymes (Correll and Edwards, 1994). In addition, we have found that in experiments with over 100 pharmaceutical agents used to treat a variety of distinct clinical indications in mammals, approximately 80% produced significant changes in gene expression in the GRM, indicating that there is substantial overlap in drug specificity between mammalian and yeast systems.
Yeast also contain genes that encode proteins that do not have plant and/or animal homologs. These non-homologous genes may be used as targets for the design and discovery of highly specific antifungal agents for use in plants and animals, including humans. The GRM may be used to identify genes that are expressed in particular metabolic pathways. Non-homologous genes in a pathway of interest may be used as targets for design and discovery of antifungal agents, for instance. See, e.g., U.S. Ser. No. 60/127,272, filed concurrently herewith.
One metabolic pathway of interest for identification of both homologous and non-homologous genes is the pathway for synthesis of isoprenoids. Eukaryotic cells utilize a group of structurally related compounds, the isoprenoids, for a vast array of cellular processes. These processes include structural composition of the lipid bilayer, electron transport during respiration, protein glycosylation, tRNA modification, and protein prenylation. All isoprenoids are synthesized via a pathway known variously as the isoprenoid pathway, mevalonate pathway, or sterol biosynthetic pathway. Although the bulk end product of the pathway is sterols, there are several branches of the pathway that lead to non-sterol isoprenoids. Due to the involvement of isoprenoids in a variety of physiologically and medically important processes, a comprehensive understanding of the regulation of this pathway would offer many scientific and practical benefits.
The regulation of the isoprenoid biosynthetic pathway is known to be complex in all eukaryotic organisms examined, including S. cerevisiae. The overriding principle for the regulation of this pathway is multiple levels of feedback inhibition. This feedback regulation is keyed to multiple intermediates and appears to act at numerous steps of the pathway, involving changes in transcription, translation and protein stability. Additionally, the availability of molecular oxygen, required for sterol and heme biosynthesis, also regulates the expression of genes at key steps of the pathway The emerging picture is that the isoprenoid pathway has numerous points of regulation that act to control overall flux through the pathway as well as the relative flux through various branches of the pathway.
Given the complexity of the isoprenoid pathway, it can be difficult to understand the regulation of any one step of this pathway, unless it is viewed within the context of the entire pathway. Thus, the GRM is ideal for understanding the regulation of the isoprenoid pathway because one may observe the regulation of all the yeast genes involved in the isoprenoid pathway at one time by using the GRM. In addition, analysis of the gene expression provided by the GRM (preferably using software described below) may provide information about which particular genes in the isoprenoid pathway are important regulatory genes in the pathway, those which are important indicator genes of the isoprenoid pathway, and those which are suitable targets to regulate isoprenoid synthesis.
Today we have the luxury of reflecting upon the wealth of information that has come from decades of research into the cell biology and genetics of yeast. Still, less than 20% of the hypothetical ORFs discovered by the yeast genome project had been previously identified through basic research (Goffeau et al., 1996). Additionally, 25% of the yeast ORFs with obvious human homologs have no known function (Botstein et al., 1997). The situation will likely be the same when the human genome sequence is completed.
Several research groups have created software programs that enable the comparison of both chemical and genetic expression profiles to identify related gene expression response patterns, as shown, for example, in FIG. 38. In addition, expression changes of individual genes in response to any given treatment can often be accessed through hypertext links. Currently, our software will: 1) normalize expression data; 2) rank changes in individual gene's expression relative to a particular treatment; 3) rank similarities between genomic expression profiles as a result of a chemical or genetic treatment; and 4) determine the correlation coefficient for an individual gene's expression relative to that of all other genes to identify regulons, or groups of genes that share the same regulatory programs. See U.S. application Ser. No. 09/076,668, now pending; Eisen et al. (1998); and Tamayo et al (1999).
The ability to assign ORFs to functional groups based upon their expression patterns will provide valuable information pertaining to the function of proteins from model organisms as well as their mammalian counterparts. Analysis of genomic expression patterns may also reveal upstream regulatory sequences, including promoters, with great utility for regulated or constituitive expression of recombinant genes. Such regulated sequences can be used for making reporter constructs for any selected process intrinsic to a given genome.
These functional genomics studies will provide a great deal of information that can implicate yeast genes, as well as their mammalian counterparts, in a variety of cellular functions. Associations of particular genes with specific biological pathways will be made by virtue of the genes' patterns of regulation under numerous conditions.
One particular problem in the prior art has been identifying genes whose expression is representative of a specific biological (e.g., metabolic) pathway. One would like to be able to measure the expression of a gene or its encoded protein to indicate the effect of a particular treatment on a specific pathway. Thus, there is a need for various pathway indicator genes for the various metabolic pathways.
A second problem in the prior art has been identifying genes and their encoded proteins which can be efficient targets within a specific biochemical pathway or set of associated pathways. Once good targets have been identified, pharmaceutical compounds and treatments may be designed or discovered to regulate the expression or activity of the target gene or protein.