The present invention comprises a general procedure applicable in virtually any cell type for identification of nucleic acid sequences that perturb specific biochemical pathways within a cell.
Genetic methods have played a major role in efforts to understand the molecular basis for biological phenomena. For example, genetic analysis of the fruit fly, D. melanogastor, provided the entry point for isolation of numerous genes that regulate the formation of the fly body. These genes in turn served as probes for isolation of mammalian homologs that have been the primary tools in molecular studies of vertebrate development.
A variety of genetic and biochemical studies have proved that virtually any biological process (i.e., cell behaviors and the like) can be broken down into components. This reductionist approach to biological inquiry aims to understand the greater part of life""s complexity in the relatively simple chemical terms of molecules and molecular interactions. In the middle part of the twentieth century, several scientists, perhaps most notably George Beadle, showed that metabolism can be understood as a series of enzymes that act sequentially to convert precursor compounds into the final metabolic products. This insight gave rise to the notion of genetic or biochemical pathways that control cellular processes. More complicated cellular behaviors such as differentiation have recently been defined in terms of genetic programs and pathways. Even disease processes can be thought of in such terms. For example, cancer is a disease characterized by loss of cellular growth control. An effective strategy to study cancer involves the elucidation of cellular growth regulation pathways. Many genes involved in growth control have been identified and substantial progress has been made in understanding the genetic/biochemical circuitry of these component genes.
Some organisms are especially tractable in genetic studies. These organisms typically are either unicellular, or have short life cycles, small genomes, and a variety of other useful features. Other organisms, such as humans, are less tractable. For tractable experimental organisms, two basic approaches to mutant isolation are available. The first method, termed screening, involves the sometimes painstaking inspection of thousands of individual organisms or clones of cells. Those that have the appropriate mutant phenotype are separated from the others and permitted to grow in isolation. In this manner, homogeneous populations of mutants can be grown and analyzed. The second approach involves growth of organisms under conditions that favor the survival of variant phenotypes over the wild type phenotype. In the case of microorganisms, the selection conditions often involve nutritional requirements or resistance to drugs.
The classical models for genetic studies include E. coli, S. cerevisiae, D. melanogastor, and M. musculus. These organisms share certain features that facilitate genetic studies. First, they can be used to screen and/or select for interesting phenotypic variants (mutants). Second, they can be manipulated in such a way that the underlying genes responsible for specific mutant phenotypes can be localized and isolated by molecular cloning methods. These features permit the analysis of genes in cases where detailed biochemical information about the process under study is unavailable. All that is required at the outset is a tractable experimental organism and a phenotype that can be scored or selected.
In certain organisms such as humans which are of great interest, but in which classical genetic methods of selective breeding cannot be applied, it is still possible to use genetic analysis to identify genes. The techniques are somewhat different and involve retrospective phenotypic and genotypic analysis of kindreds that segregate traits of interest. Such kindreds can be used to determine the approximate location of genes that affect the trait of interest. This approach relies heavily on aspects of heredity that involve sexual reproduction, segregation, and recombination. From rough mapping information, the responsible gene(s) can often be isolated (Miki Y., Swensen J., et al., Science 266: 66-71 (1994)).
Cultured cells from multicellular organisms, as well as single-celled organisms, offer the great advantage that genetic studies can be performed on the simplest unit of life, the cell. In many microorganisms, genetic methods are suitably advanced so that detailed genetic analysis of a wide variety of phenotypic traits is possible. In other organisms such as humans, however, genetic studies in cultured cells are still very difficult. Though cultured somatic cells have provided the route to identification of several important human genes, somatic cells have traits that seriously limit their utility. They are diploid; hence mutants with a recessive phenotype are rarely observed. They reproduce clonally; hence it is not possible generally to map interesting mutations. They are often heterogeneous; hence, each cell in a supposedly identical population of cells may differ slightly in phenotype from another cell for a variety of genetic and epigenetic reasons. They do not lend themselves to a large variety of selection schemes. Genetic methods that can mitigate against these problems in human cells would be particularly valuable.
Genes regulate some of the most medically and commercially important processes in biology. A long list of human diseases are caused by mutations or malfunctions of specific genes. Cancer may be the most familiar example, as it involves the sequential alteration of proto-oncogenes and tumor suppressor genes as tumors progress through stages of malignancy (Fearon E. R. and Vogelstein B., Cell 61: 759-767 (1990)). Methods capable of identifying the underlying genes that regulate important biological processes such as tumor progression would thus be of great value.
For the foregoing reasons, a general method of genetic analysis in cultured cells is needed. The method should be simple, rapid, and permit identification of components of genetic pathways that regulate traits of interest. It should circumvent many of the obstacles that have interfered with genetic analysis in certain cells and organisms. It should not require an understanding of the detailed basis of a particular phenotype, or the mechanisms that underlie specific cellular behaviors. The method should be generally applicable to a great variety of cells, including cells cultured from somatic tissues of multicellular organisms, and it should sidestep certain disadvantages of somatic cell genetics, including the diploid character of most cells, the difficulty of isolating mutant genes once mutations have been induced, and the heterogeneity of many cell populations.
The present invention is directed to a method of genetic analysis that satisfies the need for a simple, rapid, and general way to identify components of genetic pathways that regulate traits of interest. The method involves the use of three basic tools: (1) a reporter gene that reflects the phenotypic state of a particular cell; (2) a selection device or method that permits rapid quantitative measurement of the expression levels of the reporter molecule on a cell-by-cell basis; and (3) an expression library, preferably of proteins, protein fragments, or peptides (xe2x80x9cperturbagensxe2x80x9d), that can be introduced into the chosen cell population (host cells). The reporter gene is typically contained in a construct that places it under the control of a specific cis regulatory element whose activity correlates with the trait of interest. This construct is introduced into a population of host cells such that it is stably maintained and expressed. A genetic library constructed in a second expression vector is introduced into the host cells that harbor the reporter gene construct. This second expression library generates perturbagens in the host cells. The host cells are analyzed using a method or device that quantitatively detects reporter expression levels. Cells with reporter gene expression levels that are decreased or increased relative to the expression observed in cells that contain only the stably expressed reporter, without the perturbagens, are selected and their library inserts are isolated.
The reporter serves as a surrogate for the cellular phenotype and thus must be chosen carefully to reflect the relevant phenotypic state as closely as possible. The reporter may be an endogenous gene, preferably encoding a cell surface marker, expressed by cells with the phenotype of interest, or it may be a foreign gene placed under the control of a cell-type-specific or cell-state-specific promoter that is active in the cells under study. The reporter is expressed in the host cells at a level sufficient to permit its rapid and quantitative determination.
Perturbagens are molecules that act in a transdominant mode to interfere with the function of endogenous cellular components. In the present invention, perturbagens are typically proteinacious: proteins, protein fragments, or peptides; though perturbagens may also be nucleic acids. By expressing perturbagens in cells, it is possible to disrupt specific normal interactions, thus generating a xe2x80x9cphenocopyxe2x80x9d of a mutant phenotype; that is, although no mutations are created by the method, the function of specific cellular constituents is affected as if the genes encoding these proteins were altered by mutation. Perturbagen genetic libraries are introduced into the host cells that harbor the reporter expression construct in such a way that a single type of each perturbagen (or a small number of different perturbagens) is expressed in a host cell.
The selection device or method is used to screen rapidly through millions of cells that harbor the reporter gene construct for variants that express altered levels of the reporter and to sort (or select) those variant cells away from the majority of cells that express normal levels. This selected population that expresses altered levels of the reporter is used in turn to isolate the resident perturbagens by, e.g., PCR (Ausubel F. M., Brent R., et al., Current Protocols in Molecular Biology, John Wiley and Sons, New York (1996)). The selection procedure results in enrichment of the initial population of cells harboring the perturbagen library for cells that contain perturbagen fragments that affect reporter gene expression. The sub-library of perturbagen fragments that influence reporter gene expression can be reintroduced into the host cells and the process of screening/selection can be repeated. The whole cycle is repeated as many times as necessary to obtain a relatively pure sub-library of perturbagen-encoding inserts which, when introduced into the host cells, causes altered reporter gene expression. Each of these perturbagen fragments can be isolated and studied individually.
Because the selection occurs at the population level, and further enrichment cycles are simple to perform, the time associated with gene isolation is greatly reduced. In addition, this approach diminishes the chance that a particular perturbagen isolated according to the methods described herein acts idiosyncratically in a minority of host cells. Screens/selections for virtually any phenotype are possible, limited only by the fidelity with which the reporter represents the cell phenotype of interest.
Perturbagen fragments isolated in this manner produce phenocopies; i.e. they generate the equivalent of genetic mutations. Each fragment encodes a perturbagen that affects expression of the reporter. In principle, any component of the genetic pathway that leads to reporter gene expression is vulnerable to perturbagen disruption. For example, the reporter gene may be expressed only in the presence of a specific transcription factor. If the perturbagen sequesters this factor, or acts upstream of the factor to reduce its activity, reporter gene expression will be reduced. The present invention also can be used to generate a perturbagen disruption that causes a phenotypic transformation such that the original cell type is converted into a different cell type in which the reporter gene is not expressed. Such a perturbagen identifies a master switch; a single molecule capable of dictating the phenotype of the cell.
A cloned perturbagen-encoding sequence may rapidly give direct and indirect information about the pathway it affects. If the perturbagen is derived from a gene or gene fragment, it may be related to a previously identified component of the pathway and its sequence may reveal its identity. The target of the perturbagen may be a second component of that pathway, whose identity can be inferred. Alternatively, the target molecule can be identified by techniques known in the art such as the yeast two-hybrid screen (See Fields S. and Song O. -K., U.S. Pat. No. 5,283,173) or by xe2x80x9csuppressorxe2x80x9d perturbagen methods outlined infra (Jarvik J. and Botstein D., Proc. Natl. Acad. Sci. (USA) 72: 2738-2742 (1975)). Thus, a few selection experiments performed on several millions of cells should enable identification of most or all of the components of a particular pathway which are vulnerable to this type of disruption. Finally, if these components are involved in a process of commercial significance, the perturbagen provides a tool to develop valuable reagents either directly, or as a substrate for screening.