The present invention comprises procedures for identifying, characterizing, and evolving cis-acting nucleic acid sequences that act in a cell-type specific manner to stimulate or repress the expression of linked genes or other neighboring sequences.
A variety of cis-acting nucleic acid sequences influence expression levels of genes in prokaryotic and eukaryotic cells. These sequences act at the level of mRNA transcription, mRNA stability, or mRNA translation (Alberts B., Bray D., et al. (Eds.), Molecular Biology of the Cell, Second Edition, Garland Publishing, Inc., New York and London, (1989)). In the cases of RNA stability and translation, the cis sequences are present on the RNA molecules themselves. In the case of transcription, the cis sequences may be present either on the transcribed sequences or they may reside nearby in regions of the gene that are not transcribed.
In prokaryotes that have been studied, most of the transcriptional control sequences lie immediately upstream of the RNA start site in an area called the promoter. In the case of E. coli promoters, for example, the consensus promoter sequence consists of two regions, one located about 10 basepairs upstream of the start site, and one located about 35 bases upstream. These sequences coordinate the binding of RNA polymerase, the principal enzyme involved in transcription. Other sequences also influence the level of transcription of E. coli genes. These sequences include repressor-binding sites and other sites that bind ancillary factors that regulate interaction between RNA polymerase and the promoter.
In prokaryotes such as E. coli, little regulation is exerted at the level of transcript stability, probably because the cell division cycle is typically very short. Thus, transcript half-lives are generally only a few minutes. However, considerable control is exercised at the level of translation. In E. coli, sequences immediately upstream of the translational start site (Shine-Dalgarno sequences) mediate the binding of mRNA molecules to the ribosome, and hence, the efficacy of translation.
In eukaryotes, the control of gene expression is more complex but some of the same principles are involved. Gene expression levels are influenced not only by cis sequences that bind transcription regulatory factors, but also by sequences that affect the overall conformation of the DNA in the vicinity of the gene in question. These effects on chromatin structure are less well understood, but are likely to be very significant. It is thought that structural components such as histones and other proteins pack or unpack in a regulated fashion to affect the global and local conformations of DNA, and thus the accessibility of cis regulatory elements in or near genes.
The promoter regions of eukaryotic genes are also more complex than prokaryotic promoters and generally involve binding sites for numerous factors in addition to the RNA polymerase holoenzyme. Certain sequences are involved specifically in the process of transcription initiation, such as the TATA box (Myers R M, Tilly K, and Maniatis T., Science 232: 613-618 (1986)), whereas other sequences act to influence the rate of initiation. These latter sequences have been called enhancers, and they have the property of being relatively insensitive to position in the promoter (Wasylyk B., Wasylyk C., and Chambon P., Nucleic Acids Res. July 25; 12: 5589-5608 (1984)). Many enhancers are located several kilobasepairs away from the gene whose expression level they regulate.
Because cell generation times in eukaryotes are typically longer than in prokaryotes, transcript stability is an important mode of regulation. For instance, some transcripts such as c-Fos have half lives on the order of minutes, while others have half lives on the order of hours. Sequences located at a variety of sites within the transcript influence the susceptibility of specific mRNA molecules to degradation by RNases within the cell (Ross J., Microbiol Rev.: 423-450 (1995).
Translational regulation also plays a significant role in eukaryotic gene expression. Secondary structure in particular transcripts can influence translation rates, as can codon usages. In addition, the sequence composition surrounding the translational start site (the Kozak-consensus sequence) is an important factor in translational efficiency (Kozak M., Cell January 31; 44: 283-292 (1986)).
In both prokaryotes and eukaryotes, the activity of many promoters is regulated according to the state of the cell. In metazoans, the situation can be much more complex because certain promoters may be active only in specific cell lineages. Thus, their activity must be regulated according to the particular time in development of the organism and the specific cell type.
Genetic screens and selections allow identification of regulatory elements in genes. If a powerful genetic selection or screen is enforced on a population of cells, it is possible to identify variants that have properties worthy of further study. Multiple rounds of selection or screening may permit the ultimate identification of variants in cases where a single round of selection/screen is not sufficient to enrich the population of desired variants. Genetic selections typically involve conditions whereby wild type cells die or grow slowly compared to variant cells in the population. Such conditions may be forced upon a culture of cells or a population of organisms. An equivalent process may involve a xe2x80x9cscreen and pluckxe2x80x9d approach, where interesting variants are identified from the population, separated, and allowed to replicate in isolation. Such a process ultimately leads to an enrichment in the selected population for variants with the desired phenotypic traits, and a diminution of cells or organisms with the parental phenotype.
Numerous approaches have been applied to the identification and study of cis regulatory sequences. However, in general the approaches have been relatively labor intensive and slow. In addition, the approaches have generally been aimed at the study of the behavior of cis sequences in the natural setting; i.e., the intention has been to study the normal regulation of such sequences in the cell.
In certain cases, cis sequences have been deliberately engineered to control expression of particular genes in desirable ways. For example, it is useful to regulate tissue specificity and levels of exogenous genes using defined regulatory elements. This may involve fine control over tissue specificity, e.g., as in expression of the SV40 T antigen (TAg) in pancreatic islet beta cells by linking the TAg gene to the insulin promoter (Hanahan D., Nature May 11; 20: 2233-2239 (1985)), or it may involve efforts to maximize expression, e.g., as in the use of viral regulatory sequences such as the CMV enhancer (Wilkinson G. W., and Akrigg A., Nucleic Acids Res. May 11; 20: 2233-2239 (1992)), or it may involve efforts to modulate expression levels from low to high, e.g., as in the LacSwitch (Fieck A., Wyborski D. L., and Short J. M., Nucleic Acids Res. 20: 1785 (1992)) and TetSwitch systems (Iida A., Chen S. T., et al., J. Virol 70: 6054-6059(1996)).
A variety of techniques have been used in to identify cis sequences that regulate gene expression. These include biochemical methods that identify sites of interaction with protein factors, comparative sequence analysis, characterization of regulatory mutations in genes, and assay of deliberately constructed sequence variants for their effects on gene expression (Latchman David S., Eukaryotic Transcription Factors Second Edition, Academic Press, London (1996); McKnight S. L., and Yamamoto K. R. (Eds.), Transcriptional Regulation, CHSL Press, New York (1992)). Such methods have the drawback that they often require some a priori knowledge of the nucleic acid sequence of the regions of interest. In addition, several methods have been employed to xe2x80x9ctrapxe2x80x9d cis sequences that have promoter activity. In prokaryotes, this often involves insertion of reporter constructs (involving, e.g., the LacZ gene) into the vicinity of genes such that the reporter is brought under the control of specific promoters. Screening or selecting for expression of the reporter permits the identification of promoters that have particular properties; for example, promoters that are active only under conditions of stress in the cell (Kenyon C. J., and Walker G. C., Proc. Natl. Acad. Sci. USA May; 77: 2819-2823 (1980)). Similar methods have been applied in metazoans, particularly in Drosophila melanogaster to identify genes with interesting expression patterns, and hence, promoters/enhancers. Such methods often fall under the rubric of xe2x80x9cenhancer trapxe2x80x9d or xe2x80x9cpromoter trapxe2x80x9d screens (Bellen H. J., O""Kane C. J., et al., Genes Dev 3: 1288-1300 (1989)). Such methods suffer from the limitations of being slow and labor intensive. In addition, they are generally intended to identify natural sequences that have specific regulatory properties in vivo, as opposed to artificial sequences with preselected behavior.
In mammalian cells, a variety of methods have been used to identify interesting regulatory sequences by genetic screens or selections. In the general approach used for identification of cis regulatory elements through genetics, reporter constructs or selectable markers are used. Reporter genes that have been used include the choline acetyl transferase (CAT) gene (Thiel G., Petersohn D., and Schoch S., Gene February 12; 168: 173-176 (1996)), the LacZ gene from E. coli (Shapiro S. K., Chou J., et al., Gene November; 25: 71-82 (1983)), the green fluorescent protein (GFP) gene from jellyfish (Chalfie M. and Prashner D. C., U.S. Pat. No. 5,491,084), and numerous others. Genes that function as selectable markers (i.e., conditions can be chosen such that cells lacking the marker die) can also be used. Such selectable markers include genes that encode resistance to hygromycin, mycophenolic acid, neomycin, and other agents (Ausubel F. M. Brent R. et al. (Eds.) Current Protocols in Molecular Biology, John Wiley and sons, New York (1996)).
In one type of enhancer trap screen used for identifying cis sequences from mammalian cells, retroviruses that include reporter genes are used to infect cells. Depending on the more-or-less random integration of the virus in particular cells, the reporter construct is placed in a position where it can respond to specific cis sequences present in the host cell chromosome. This approach is exemplified by Ruley H. E. and von Melchner H., U.S. Pat. No. 5,364,783. In other approaches, selection schemes can be designed which allow identification of cis sequences that respond in a defined manner; e.g., they mediate induction or suppression by glucocorticoids (Harrison R. W., and Miller J. C., Endocrinology July; 137:2758-2765 (1996)). Limitations of these methods include the inability to easily select for cis sequences that control gene expression in a cell-type dependent manner and the reliance of such methods on the capacity of a vector to integrate into the host cell genome.
Control of gene expression is an exceedingly important issue in the detection and treatment of human disease. Many diseases can be viewed as defects in proper regulation of gene expression. One of the clearest illustrations is cancer, a heterogeneous disease caused by accumulated mutations that result in loss of cellular growth control. A combination of inactivation of tumor suppresser genes, and activation of oncogenes produces the cancer cell phenotype. Thus, disease detection and prognosis may be facilitated by methods that permit the analysis of gene expression profiles in cells, and by strategies that take advantage of the tendency of specific cell types to express certain genes. Information relevant to such strategies for diagnosis may also be relevant to therapy. For example, sequences that ensure proper regulation of particular gene therapeutics are valuable in controlling side effects of the therapeutic agent.
A simple method is needed for identification and characterization of cis sequences that control gene expression in a cell-type dependent manner. This method should permit identification of sequences that allow specific expression; that is, high expression in one cell type, and low expression in another. The method should be general, i.e. it should be applicable to nearly all cell types; it should be rapid; and it should be useful for evolving cis sequences from natural or synthetic building blocks into sequences with characteristics that may differ from cis regulatory sequences found in nature. In addition, the method should allow the mechanism of this specific expression to be directly elucidated. Cis sequences with such defined properties would have tremendous potential value in the diagnosis and treatment of diseases.
In the case of diagnosis, cell-type specific cis sequences would offer the possibility of developing an assay based on gene expression for detection of particular diseased tissues or pathogens. For instance, a cis sequence linked to a reporter could be introduced into biopsy samples and the expression of the reporter could be monitored by a colorimetric assay or by the polymerase chain reaction (PCR) (Ausubel F., Brent R., et al., 1996). If a tumor-specific cis sequence were linked to the reporter, a positive result of the assay (i.e., expression of the reporter gene) would indicate the presence of malignant cells in the biopsy. Thus, cis sequences that regulate gene expression in a cell-specific manner open up novel opportunities for potentially very sensitive and general diagnostic testing.
In the case of therapy, it is often advantageousxe2x80x94even essentialxe2x80x94to confine the expression of a transgene (a gene introduced into germline or somatic tissue) to a particular cell type. For example, if a cis sequence were found that conferred expression of linked genes only in tumor cells and not in normal cells, this sequence would be useful as a mechanism for directing selective expression of genes in tumor cells. Normal cells that inadvertently picked up the gene would not be affected because the gene would remain silent. Another example involves virus-infected cells. If a cis regulatory sequence were identified that was active only in infected cells, these cells could be targeted for elimination by an appropriate construct that included such a sequence. Finally, if cell-type specific cis sequences were identified, they would be useful in creating reporter constructs that could detect and serve as a surrogate for the phenotypic state of a specific cell type.
The invention comprises a combination of tools that together allow cis sequences with cell-type specific effects on gene expression to be identified. The tools include a reporter gene, an appropriate expression vector, a genetic library, and a method for screening or selecting cells based on reporter expression level.
In a preferred embodiment, the expression vector is designed so that reporter expression is completely disabled or occurs at a low level unless appropriate cis sequences are located in the expression construct to activate transcription. Such cis sequences may be promoters, enhancers, or both. This xe2x80x9cdeadxe2x80x9d expression vector may be used as a cloning vehicle for nucleic acid fragments derived from a variety of sources, such as genomic DNA, mRNA, cDNA, or from oligonucleotide synthesis. The fragments may range in size from a few base pairs up to several kilobasepairs, depending on the objective of the particular experiment. These fragments are inserted into the vector to generate a library of cloned fragments. The library is introduced into one type of host cell (e.g., a tumor cell) and after a period of time sufficient to allow expression of the reporter, the cells are screened to select cells that express the reporter. In a preferred embodiment, GFP or any molecule capable of being labeled directly or indirectly with a fluorophore is used as the reporter and selection may be accomplished using a flow sorter device such as a fluorescence-activated cell sorter (FACS) by measuring the fluorescence signal from the reporter and collecting positive (xe2x80x9cbrightxe2x80x9d) cells (Autofluorescent proteins; AFPs(trademark) Quantum Biotechnologies, Inc.; Robinson P. J., Darzynkiewicz Z., et al. Current Protocols in Cytometry, Published in Affiliation with the International Society for Analytical Cytology (1997). These cells contain expression vectors harboring library fragments that have brought the previously dead construct to life; e.g., promoters active in the particular cell type used for the experiment. Present-day FACS machines easily can sort 107 to 108 cells per hour. Thus millions of sequences can be screened in a short period of time to identify positive or negative cells.
To recover cell-specific cis sequences, a counterscreening step is performed. In one embodiment of this step, the sub-library of fragments that activate transcription is moved from a first host cell into a second host cell (e.g. a non-tumor cell). In a preferred embodiment, the second host cell is passed through a FACS, but this time negative (xe2x80x9cdimxe2x80x9d) cells are recovered. In some circumstances it may be helpful to include an independent reporter to ensure that the dim cells contain the expression construct. The sub-library of fragments contained in this fraction of cells is retrieved. The sub-library of fragments retrieved from the recovered second host cells may be moved back into the first host cells and the screening and counterscreening procedure can be repeated several times to ensure that fragments are recovered from the experiment which are selectively active in one cell type and not the other. These fragments can be characterized individually for activity and also by nucleic acid sequence analysis.
In another preferred embodiment, the process begins with a xe2x80x9clivexe2x80x9d expression vector and reverses the selection criteria so that first host cells that do not express the reporter are selected in the screening step and second host cells that do express the reporter are selected in the counterscreening step. This embodiment of the method can be used to identify cell-type specific sequences that act as repressors of gene expression. In addition, it is possible to evolve novel cell-type specific sequences by mutation in vitro, by recombination in vitro, or by other mechanisms.
Comparisons of the nucleic acid sequences of cell-type specific cis sequences identified according to the methods of the invention with existing databases may allow identification of known promoter elements. Finally, the sequences identified in accordance with the methods of the invention can be used in subsequent biochemical experiments to identify factors from the two host cell types that are responsible either for activation of expression, or for repression. For example, cell extracts of the two host cell types can be incubated with the fragment, and the bound factors can be characterized. It may be possible to use mass spectrometry to identify the masses of peptide fragments derived from bound proteins by comparison to the EST database (Shevchenko A., Jensen O. N., et al., Proc. Natl. Acad. Sci. USA December 10; 93:14440-14445 (1996)). Thus, an underlying mechanism for the behavior of the cis sequences can be readily determined.