This invention relates to natural and synthetic cell- or tissue-specific transcriptional regulatory regions that regulate gene transcription in particular cells or tissues. In addition, this invention also relates to the methods for the selection, identification and evaluation of the synthetic cell- or tissue-specific transcriptional regulatory regions. None of the information described herein is admitted to be prior art to the present invention, but is provided solely to assist the understanding of the reader.
Cell- or tissue-specific gene expression plays a central role in the proliferation and differentiation of cells. As the first step of gene expression, transcription is an important step for regulation. The study of transcriptional regulatory regions is one of the major fields in modern biology. The transcriptional regulatory regions are also very important for applications in biotechnology, such as in gene therapy and the production of recombinant proteins.
Transcriptional regulatory regions generally have two portions: transcription initiation sites and enhancers which are capable of regulating the transcription level from a distance to the initiation sites. The binding of transcription factors to the regulatory regions is necessary for the regulatory regions to regulate transcription. The regulatory regions fall into several categories: general regulatory regions which regulate transcription in all cells of an organism, inducible regulatory regions which only regulate transcription in response to certain signals, and cell- or tissue-specific regulatory regions which only regulate transcription in certain cells.
Several methods have been used to identify the regulatory regions. One of these methods is the analysis of regions that are important for the proper expression of cloned genes. The first step is usually to identify rough boundaries of the regulatory regions using deletion and mutation analysis of the cloned genes. These regions include the 5′ upstream regions, 3′ downstream regions, and sometimes introns or coding sequences within the gene itself. Most studies are performed using chimeric constructs containing a reporter gene such as β-galactosidase (β-gal), chloramphenicol acetyltransferase (CAT), luciferase or growth hormone (GH). The regions that actually bind protein factors can be more accurately defined using DNA footprinting techniques followed by mutation analysis. The sequences that bind protein transcription factors are often referred to as transcription factor binding sites.
Consensus sequences for a number of common binding sites have been determined. One example is the binding site recognized by the family of basic-helix-loop-helix (bHLH) transcription factors. The consensus sequence of binding sites for bHLH proteins is 5′-CANNTG-3′, where “N” can be any nucleotide. This binding site is called the “E box” and is found in the regulatory regions of a number of genes that are expressed in diverse cell types, including lymphocytes, muscle cells and fibroblasts. Some bHLH proteins are common to most or all cells while others are cell-specific. In addition, bHLH proteins form heterodimers and the interaction of some of these dimers with DNA is cell-specific. The binding of different bHLH proteins to specific regulatory regions appears to be affected by the variable dinucleotide sequence within the core consensus sequence and the sequence adjacent to the core sequence (Sun, et al., Cell 64: 459-470 (1991)).
Binding sites associated with newly cloned and sequenced genes can also be identified by searching the sequence for homology with the sequences of known binding sites that have been characterized from other, sometimes related, genes.
In addition, several methods were developed to identify the binding sites of transcription factors without cloning of the target genes. Selected and amplified binding site (SAAB) method was used to identify the binding sites for known transcription factors (Blackwell, et al., Science 250: 1104-1110 (1990)). By using this method, synthesized templates with random sequences are incubated with purified transcription factors. Those bound to transcription factors are isolated with electrophoretic mobility shift assay (EMSA). The templates are then amplified by the polymerase chain reaction (PCR). After reiteratively being rebound and reamplified, the binding site of the transcription factor is sequenced and identified. The binding site of transcription factor myc was identified with this method (Blackwell, et al., Science 250: 1149-1151 (1990)).
It is often difficult, however, to identify and purify transcription factors for use in such assays. Indeed, the binding sites are often identified first and then are used to facilitate the identification and purification of transcription factors binding to the sites. Moreover, in many studies, it is crucial to understand the characteristics of certain regulatory regions, whereas it is not necessary to know the transcription factors binding to the regulatory regions. A method similar to SAAB, multiplex selection technique (MuST) was therefore developed (Nullur, et al., PNAS 93: 1184-1189 (1996)). In the multiplex selection technique, purified transcription factors are replaced with crude nuclear extract, so that binding sites can be identified without the identification of transcription factors. The identified binding sites can then be used to identify the corresponding transcription factors.
The regulatory regions often consist of multiple different binding sites for transcription factors. The characteristics of a regulatory region are determined by the composition and arrangement of the binding sites. In addition to naturally-occurring regulatory regions, synthetic regulatory regions can be constructed through the combination and modification of binding sites.
Available naturally-occurring regulatory regions are not always capable of regulating transcription in a desired manner. In these cases, as well as others, synthetic regulatory regions may be utilized to provide the desired functional characteristics. As an example, synthetic herpes simplex virus (HSV) regulatory regions were constructed by linking the 5′ nontranscribed domain of an HSV a gene to a fragment containing the transcription initiation site and the 5′ transcribed noncoding region from an HSV .gamma. gene (Roizman, PCT 94/14971). The resulting synthetic regulatory regions direct constitutive transcription of the heterologous gene throughout the reproductive cycle of the virus at a high cumulative level. Synthetic regulatory regions were also constructed to achieve high inducible transcription levels and low basal transcription levels (Filmus, et al., PCT 93/20218).
In both of the above cases, the binding sites are well-understood transcription factor response elements. Many binding sites, however, are not well-understood, especially those identified without the cloning of the corresponding transcription factors. These binding sites are therefore only potential transcription factor response elements until they are confirmed to be functional for transcription regulation using functional assays. These assays are usually a laborious and costly task. It is even more complicated for synthetic regulatory regions produced by the combination, modification and rearrangement of various binding sites.