This invention relates to natural and synthetic cell- or tissue-specific transcriptional regulatory regions that regulate gene transcription in particular cells or tissues. In addition, this invention also relates to the methods for the selection, identification and evaluation of the synthetic cell- or tissue-specific transcriptional regulatory regions. None of the information described herein is admitted to be prior art to the present invention, but is provided solely to assist the understanding of the reader.
Cell- or tissue-specific gene expression plays a central role in the proliferation and differentiation of cells. As the first step of gene expression, transcription is an important step for regulation. The study of transcriptional regulatory regions is one of the major fields in modern biology. The transcriptional regulatory regions are also very important for applications in biotechnology, such as in gene therapy and the production of recombinant proteins.
Transcriptional regulatory regions generally have two portions: transcription initiation sites and enhancers which are capable of regulating the transcription level from a distance to the initiation sites. The binding of transcription factors to the regulatory regions is necessary for the regulatory regions to regulate transcription. The regulatory regions fall into several categories: general regulatory regions which regulate transcription in all cells of an organism, inducible regulatory regions which only regulate transcription in response to certain signals, and cell- or tissue-specific regulatory regions which only regulate transcription in certain cells.
Several methods have been used to identify the regulatory regions. One of these methods is the analysis of regions that are important for the proper expression of cloned genes. The first step is usually to identify rough boundaries of the regulatory regions using deletion and mutation analysis of the cloned genes. These regions include the 5xe2x80x2 upstream regions, 3xe2x80x2 downstream regions, and sometimes introns or coding sequences within the gene itself. Most studies are performed using chimeric constructs containing a reporter gene such as xcex2-galactosidase (xcex2-gal), chloramphenicol acetyltransferase (CAT), luciferase or growth hormone (GH). The regions that actually bind protein factors can be more accurately defined using DNA footprinting techniques followed by mutation analysis. The sequences that bind protein transcription factors are often referred to as transcription factor binding sites.
Consensus sequences for a number of common binding sites have been determined. One example is the binding site recognized by the family of basic-helix-loop-helix (bHLH) transcription factors. The consensus sequence of binding sites for bHLH proteins is 5xe2x80x2-CANNTG-3xe2x80x2, where xe2x80x9cNxe2x80x9d can be any nucleotide. This binding site is called the xe2x80x9cE boxxe2x80x9d and is found in the regulatory regions of a number of genes that are expressed in diverse cell types, including lymphocytes, muscle cells and fibroblasts. Some bHLH proteins are common to most or all cells while others are cell-specific. In addition, bHLH proteins form heterodimers and the interaction of some of these dimers with DNA is cell-specific. The binding of different bHLH proteins to specific regulatory regions appears to be affected by the variable dinucleotide sequence within the core consensus sequence and the sequence adjacent to the core sequence (Sun, et al., Cell 64:459-470 (1991)).
Binding sites associated with newly cloned and sequenced genes can also be identified by searching the sequence for homology with the sequences of known binding sites that have been characterized from other, sometimes related, genes.
In addition, several methods were developed to identify the binding sites of transcription factors without cloning of the target genes. Selected and amplified binding site (SAAB) method was used to identify the binding sites for known transcription factors (Blackwell, et al., Science 250:1104-1110 (1990)). By using this method, synthesized templates with random sequences are incubated with purified transcription factors. Those bound to transcription factors are isolated with electrophoretic mobility shift assay (EMSA). The templates are then amplified by the polymerase chain reaction (PCR). After reiteratively being rebound and reamplified, the binding site of the transcription factor is sequenced and identified. The binding site of transcription factor myc was identified with this method (Blackwell, et al., Science 250:1149-1151 (1990)).
It is often difficult, however, to identify and purify transcription factors for use in such assays. Indeed, the binding sites are often identified first and then are used to facilitate the identification and purification of transcription factors binding to the sites. Moreover, in many studies, it is crucial to understand the characteristics of certain regulatory regions, whereas it is not necessary to know the transcription factors binding to the regulatory regions. A method similar to SAAB, multiplex selection technique (MuST) was therefore developed (Nullur, et al., PNAS 93:1184-1189 (1996)). In the multiplex selection technique, purified transcription factors are replaced with crude nuclear extract, so that binding sites can be identified without the identification of transcription factors. The identified binding sites can then be used to identify the corresponding transcription factors.
The regulatory regions often consist of multiple different binding sites for transcription factors. The characteristics of a regulatory region are determined by the composition and arrangement of the binding sites. In addition to naturally-occurring regulatory regions, synthetic regulatory regions can be constructed through the combination and modification of binding sites.
Available naturally-occurring regulatory regions are not always capable of regulating transcription in a desired manner. In these cases, as well as others, synthetic regulatory regions may be utilized to provide the desired functional characteristics. As an example, synthetic herpes simplex virus (HSV) regulatory regions were constructed by linking the 5xe2x80x2 nontranscribed domain of an HSV xcex1 gene to a fragment containing the transcription initiation site and the 5xe2x80x2 transcribed noncoding region from an HSV xcex3 gene (Roizman, PCT 94/14971). The resulting synthetic regulatory regions direct constitutive transcription of the heterologous gene throughout the reproductive cycle of the virus at a high cumulative level. Synthetic regulatory regions were also constructed to achieve high inducible transcription levels and low basal transcription levels (Filmus, et al., PCT 93/20218).
In both of the above cases, the binding sites are well-understood transcription factor response elements. Many binding sites, however, are not well-understood, especially those identified without the cloning of the corresponding transcription factors. These binding sites are therefore only potential transcription factor response elements until they are confirmed to be functional for transcription regulation using functional assays. These assays are usually a laborious and costly task. It is even more complicated for synthetic regulatory regions produced by the combination, modification and rearrangement of various binding sites.
Applicant has designed useful methods to create, identify and evaluate cell- or tissue-specific synthetic regulatory regions. Specifically, the methods include the selection of transcription factor binding sites, the creation of synthetic regulatory regions using the binding sites and/or portions of known regulatory regions, and the evaluation of the synthetic regulatory regions. The synthetic regulatory regions acquired with this method can be used in gene delivery or gene therapy to achieve desired gene expression in targeted cells. The acquired synthetic regulatory regions can also be used to achieve the production of recombinant proteins at high levels.
The present invention utilizes the recognition that the cells themselves contain all the information required to identify the binding sites that are most important or are recognized by the key transcription factors in the cells. The methods described for the selection of binding sites do not require any previous knowledge of the genes that are expressed or the transcription factors that are present in the cells. Thus, these methods bypass the extensive work needed for the purification, identification, and analysis of transcription factors. In addition, these methods eliminate the need to know the tissue specific transcription factor binding sites. Furthermore, many more potential binding sites can be identified using these methods than using the methods with purified transcription factors. Similarly, the methods for the creation and evaluation of synthetic regulatory regions do not require complete understanding of the binding sites. The binding sites can be linked together in various combinations and with various arrangements, and can then be evaluated to select particular synthetic regulatory regions which are functional in a certain cell line. Therefore, these methods make it possible to create and identify useful synthetic regulatory regions on a large-scale.
As indicated above, the methods discussed herein are useful for identifying regulatory region sequences for gene delivery or gene therapy. One of the major obstacles for gene delivery or gene therapy is the difficulty in expressing genes at preferred levels in certain cells or tissues. The difficulties are partly due to the lack of proper regulatory regions to direct the desired gene transcription. The functional synthetic regulatory regions identified from these methods will provide many candidates for the regulatory regions needed in gene delivery or gene therapy. Moreover, these synthetic regulatory regions will also be candidates for the regulatory regions needed in large-scale production of recombinant proteins, which also requires gene transcription at high level in certain cell lines.
A first aspect of the present invention features a method of identifying binding sites for transcription factors. The method involves identifying the oligonucleotides in protein-oligonucleotide complexes formed between a cellular or nuclear extract from a group of cells and any of a plurality of double-stranded oligonucleotide fragments. Preferably the complexes are separated from free oligonucleotides using size exclusion chromatography. The presence of an oligonucleotide in a complex is indicative that the oligonucleotide includes a binding site.
In preferred embodiments, the double-stranded oligonucleotides are made through the synthesis of single-stranded oligonucleotide and conversion of the single-stranded oligonucleotide to double-stranded oligonucleotide. Also in preferred embodiments, the oligonucleotide fragment has a central random sequence and both restriction sites and primer sequences on both ends. In preferred embodiments, the identifying step includes amplifying, cloning and sequencing the oligonucleotide fragments from the protein-oligonucleotide complexes to identify the binding sites. The amplifying step is preferably performed by polymerase chain reaction.
The oligonucleotide fragments can be of various sizes, but preferably include test sequences between about 5 and 500 bp in length, more preferably between about 5 and 100 bp, still more preferably between 20 and 50 bp.
The term xe2x80x9ctranscribexe2x80x9d or xe2x80x9ctranscriptionxe2x80x9d as used herein refers to the synthesis of RNA by RNA polymerase, following a DNA template. Transcription is the first step of gene expression and the most important step for the regulation of gene expression. That is, the regulation of gene expression is achieved mainly through the regulation of transcription.
The term xe2x80x9cgene expressionxe2x80x9d refers to the process in which genetic information flows from DNA to functional molecules, such as proteins or RNA molecules. The regulation of transcription, as a part of gene expression is achieved with the interaction between the regulatory region of a gene and various transcription factors.
As used herein, the term xe2x80x9ctranscriptional regulatory regionsxe2x80x9d or xe2x80x9cregulatory regionsxe2x80x9d refers to the regions of a gene controlling the transcription of the gene. A regulatory region often includes several portions. Some of these portions are in the initiation site for transcription, whereas others are located a distance to the initiation site. The term thus includes regions commonly referred to as enhancers.
The term xe2x80x9csynthetic regulatory regionsxe2x80x9d as used herein refers to regulatory regions which are artificially made (i.e., made by humans using molecular biology techniques) such as by the creation with one or more modifications, combinations, or rearrangements of various transcription factor binding sites.
The term xe2x80x9ctranscription factorsxe2x80x9d as used herein refers to proteins which bind to the elements of regulatory regions and regulate the transcription of the corresponding genes. According to their functions, transcription factors fall into several categories. These include general transcription factors which are needed by most genes in most cells, cell- or tissue-specific transcription factors which only regulate gene transcription in certain cells, and inducible transcription factors which regulate gene transcription in response to certain signals.
The term xe2x80x9ctranscription factor binding sitexe2x80x9d or xe2x80x9cbinding sitexe2x80x9d refers to any nucleic acid sequence which can bind transcription factors under transcription conditions or conditions approximating intracellular physical conditions.
As used herein, the term xe2x80x9ctranscription factor response elementsxe2x80x9d or xe2x80x9cresponse elementsxe2x80x9d refers to the functional regulatory region components which can bind transcription factors and thereby regulate transcription of the corresponding genes. Thus, binding sites are potential response elements, their regulatory function can readily be tested and characterized.
As used herein, the term xe2x80x9crestriction sitesxe2x80x9d refers to deoxyribonucleic acid sequences at which specific restriction endonucleases can cleave in a sequence-specific manner.
The term xe2x80x9ccellsxe2x80x9d or xe2x80x9ccellxe2x80x9d as used herein refers to a membrane-enveloped protoplasmic body capable of independent reproduction. Cells can be maintained, or propagated, in vivo, in vitro or in tissue culture and are capable of being transformed by plasmids as discussed herein.
As used herein xe2x80x9ctissuexe2x80x9d refers to a population consisting of cells of the same kind performing the same function.
The term xe2x80x9cnuclear or cellular extractxe2x80x9d refers to a preparation containing all or some of the cellular contents from inside the nuclear membrane or the plasma membrane of cells respectively, particularly including protein components. Such an extract is distinguished from a purified transcription factor.
As used in this context, the term xe2x80x9cmixingxe2x80x9d refers to putting together oligonucleotides and nuclear or cellular extract, such that the oligonucleotides and components of the extract can contact each other. Preferably a nuclear extract is used.
The term xe2x80x9coligonucleotidexe2x80x9d as used herein refers to a nucleic acid molecule consisting of same or different individual nucleotides which are covalently linked together. Oligonucleotides can be single-stranded or double-stranded, consisting of two anti-parallel single-stranded oligonucleotides with complementary sequences. For use in the identification of binding sites, each oligonucleotide strand is preferably between about 5 and 500 nucleotides in length, more preferably between 5 and 100, still more preferably between about 7 and 50, and most preferably between about 20 and 50 nucleotides in length.
The term xe2x80x9cfree oligonucleotidexe2x80x9d refers to the oligonucleotides which are not bound to proteins or any other compounds. The term xe2x80x9cprotein-oligonucleotide complexesxe2x80x9d as used herein refers to the complexes comprising oligonucleotides and the proteins bound with the oligonucleotides.
As used in the context of the oligonucleotide fragments, the term xe2x80x9cconversionxe2x80x9d is used to refer to the synthesis of a single-stranded DNA molecule complementary to another DNA molecule to form a double-stranded DNA molecule.
The term xe2x80x9cprimerxe2x80x9d as used herein refers to a single-stranded oligonucleotide, the 3xe2x80x2 end of which can be used as the initiation site for the DNA synthesis with a DNA polymerase. As used herein, the term xe2x80x9cprimer sequencexe2x80x9drefers to the sequence of the primer or the complementary sequence.
As used herein, the terms xe2x80x9c5xe2x80x2xe2x80x9d and xe2x80x9c3xe2x80x2xe2x80x9d refer to the two different ends of a single-stranded DNA molecule respectively in accord with common usage. When used in relation to a coding sequence, the terms refer to being in the 5xe2x80x2 direction from the coding sequence or in the 3xe2x80x2 direction from the coding sequence. For a sequence on a circular nucleic acid molecule, e.g., on a circular plasmid, the terms refer to the direction from a reference sequence but not fully around the chain, and preferably includes a functional relationship. Thus, for example, a regulatory region is 5xe2x80x2 to a coding sequence if it is in a position in which it would be expected to functionally affect transcription if in a 5xe2x80x2 position on a linear molecule. Usually, a 5xe2x80x2 position is closer to the 5xe2x80x2 end of a coding sequence than to the 3xe2x80x2 end.
As used herein, the term xe2x80x9csize exclusion chromatographyxe2x80x9d refers to a technique for the separation of biomolecules. This approach separates molecules into two groups, one which is smaller than the exclusion size of the chromatographic media and another which is larger than the exclusion size. The protein-oligonucleotide complexes are much larger than free oligonucleotides, so they can be readily separated, utilizing an exclusion size greater than the size of the free oligonucleotides and smaller than the size of protein-oligonucleotide complex. In this context, size refers to the effective radius of the molecule or complex. As indicated above, nuclear or cellular extract, which includes many different transcription factors, is used instead of purified transcription factors in the present invention. The protein-oligonucleotide complexes resulting from the mixing of oligonucleotide fragments and nuclear or cellular extract therefore have many different sizes. As a result, size exclusion chromatography provides a more useful separation than electrophoretic mobility shift assay (EMSA) because size exclusion chromatography produces a simple separation of bound and unbound oligonucleotides while EMSA produces a series of bands distributed over a gel. Due to the nature of the gels typically utilized, EMSA generally also requires an extraction step to recover the bound oligonucleotide from the gel for further manipulation.
The term xe2x80x9camplifyingxe2x80x9d as used herein refers to increasing the numbers of DNA molecules. The approaches for amplifying include, but are not limited to, polymerase chain reaction.
As used herein, the term xe2x80x9csequencingxe2x80x9d refers to the process of identifying the nucleotide sequence of DNA molecules. The term xe2x80x9cnucleotide sequencexe2x80x9d refers to the linear order of nucleotides in a DNA molecule or other nucleic acid molecules. Methods for sequencing of nucleic acid molecules are well-known to those skilled in the art.
A second aspect of the present invention features a method for evaluating a cell- or tissue-specific synthetic regulatory region or regions. This method involves determining whether a cell is selected under selective conditions. The method uses cells which contain different putative transcriptional regulatory regions located in transcriptional regulatory positions to a selective gene. A cell can only be selected if the selective gene is expressed at sufficiently high levels, and the selective gene will be expressed at the sufficiently high level if the putative transcriptional regulatory region is active in the particular cell. The capability of a cell to be selected in response to the selection condition indicates that the nucleic acid test sequence contains a transcriptional regulatory region active in the cell. The selection condition can be adjusted so that only strong regulatory regions will be effective to be selected in the selection condition. In general, the method involves culturing the cell or cells having the putative transcriptional regulatory sequence.
The term xe2x80x9csufficiently high levelxe2x80x9d refers to a functional level of expression which depends on the type of selection used and the stringency applied to the selection. Thus, for positive selection, the level is sufficient to allow discrimination of a cell expressing the selective gene at a xe2x80x9csufficiently high levelxe2x80x9d from an otherwise isogenic cell not expressing the gene at a sufficiently high level. For negative selection, a xe2x80x9csufficiently high levelxe2x80x9d is a level which allows the cell to grow in the presence of the selection condition.
In a preferred embodiment, the selection condition is a positive selection condition. The capability of at least one cell to be selected in the presence of the selective condition is indicative that the nucleic acid test sequence contains a transcriptional region active in the cell. The selection condition can be adjusted so that only strong regulatory regions will be effective to be selected in the selection condition.
In another preferred embodiment, the selection condition is a negative selection condition, i.e., stress condition; and the selective gene is a protective gene. The growth of the cells is inhibited under the stress condition in the absence of high level expression of the protective gene. Growth of at least one cell in the presence of the stress condition is indicative that the nucleic acid test sequence contains a transcriptional region active in the cell. The stress condition can be adjusted so that only strong regulatory regions will be effective to overcome the stress condition.
The term xe2x80x9cregulatesxe2x80x9d or xe2x80x9cregulationxe2x80x9d as used herein refers to the effect of nucleic acid sequences or other molecules involved in control of a response or action. In particular, this includes the effects of sequences involved in regulating, controlling or affecting the expression level or rate of structural genes. Generally this includes the binding of transcription factors to sequences, affecting transcription rates or other steps in gene expression.
As used in this context, the term xe2x80x9ctranscriptional regulatory positionxe2x80x9d refers to the position where functional regulatory regions can influence the transcription of the selective gene. Transcriptional regulatory positions include, but are not limited to, 5xe2x80x2 to the coding sequence of the selective gene, 3xe2x80x2 to the coding sequence of the selective gene, and within the intron or signal sequence of the selective gene. For identification and/or evaluation of synthetic regulatory regions, the region 5xe2x80x2 to the coding sequence of the selective gene is of particular interest, however, other positions are also of interest and can be utilized in this invention.
The term xe2x80x9ccell- or tissue-specific transcriptional regulatory regionxe2x80x9d as used herein refers to a nucleic acid sequence which is involved in controlling transcription through one or more coding sequences in a cell- or tissue-specific manner. As used herein, the term xe2x80x9ccell- or tissue-specific transcriptionxe2x80x9d refers to the gene transcription which occurs at a higher level in cells of a group or in certain tissue as compared to other cells or tissue of the corresponding organism generally.
As used herein the term xe2x80x9ctransfectedxe2x80x9d or xe2x80x9ctransfectionxe2x80x9d refers to the incorporation of foreign DNA into cultured cells by exposing them to such DNA. This would include the introduction of DNA by various delivery methods, e.g., via vectors or plasmids using naked DNA, DNA-cationic lipid complexes, DNA in liposomes. The methods may include techniques to enhance penetration of the cellular membrane, such as electroporation or use of lytic peptides.
The term xe2x80x9ccells of a groupxe2x80x9d as used herein refers to cells which are differentiated into the same or similar stage, and thereby have the same or similar characteristics, e.g., the same or similar characteristics with respect to control of transcription.
As used herein, the term xe2x80x9cvectorxe2x80x9d refers to a DNA construct which can be transfected into cells. Vectors can be of a variety of different types, including plasmids, viral vectors, and others. Various genes can be inserted into a vector so that the gene can be delivered into cells. The term xe2x80x9cinsertxe2x80x9d as in this context refers to incorporating a nucleic acid sequence into the vector nucleic acid sequence. Vector can include both linear and circular DNA constructs.
The term xe2x80x9cselection conditionxe2x80x9d, refers to conditions, under which cells expressing a selective gene show distinguishing features, and thereby can be easily separated from cells not expressing a selective gene. Selection condition can be positive selection condition, or negative selection condition, i.e., stress condition.
The term xe2x80x9cpositive selection conditionsxe2x80x9d refers to conditions which distinguish cells expressing the selective gene so that these cells can be easily isolated. The positive selection can be, but not limited to, Fluorescence Activated Cell Sorting (FACS) and magnetic bead sorting.
The term xe2x80x9cselective genexe2x80x9d refers to a gene whose expression confers on its host cells a special feature which allows the host cell to be distinguished from other cells with which the host cell is associated. The selective gene can be, but is not limited to, a gene coding a particular antigen or antibody, or a protective gene.
The term xe2x80x9cstress conditionsxe2x80x9d refers to conditions which either kill the cells or inhibit the division and proliferation of the cells. Such stress conditions include, but are not limited to, 1) elevated temperatures; 2) radiation; and 3) contact with particular biochemical agents.
The term xe2x80x9cprotective genexe2x80x9d means a gene encoding a protein which is capable of protecting cells from a stress condition. Such protective genes include, but are not limited to, genes for 1) adenosine deaminase; 2) dihydrofolate reductase; and 3) heat shock proteins.
The term xe2x80x9cbiochemical agentsxe2x80x9d as used herein refers to compounds which kill certain cells or inhibit the division and proliferation of certain cells. These biochemical agents include, but are not limited to, 1) xylofuranosyl-adenine; 2) methotrexate; 3) xylofuranosyl-adenine and deoxycorformacin; 4) alanosine, adenosine, and uridine.
As used in connection with binding sites and regulatory regions, the term xe2x80x9ccombinationxe2x80x9d refers to linking together two or more of the same or different kinds of oligonucleotides. The term xe2x80x9cmodificationxe2x80x9d refers to a change in the sequence of a DNA molecule, which includes, but is not limited to, the substitution of one or a few nucleotides, o:r the addition or deletion of one or a few nucleotides as compared to a reference sequence. The term xe2x80x9crearrangementxe2x80x9d refers to one or more changes in the order of subsequences of a regulatory region, and can include the insertion of a new subsequence or replacement of a subsequence with a new subsequence. This includes combinations of re-ordering, substitution, and insertion of subsequences.
A third aspect of the present invention features a method, which combines both of the above aspects, for evaluating a cell- or tissue-specific transcriptional regulatory region. The method involves identifying the oligonucleotides in protein-oligonucleotide complexes formed between a cellular or nuclear extract from a group of cells and any of a plurality of double-stranded oligonucleotide fragments. The presence of an oligonucleotide in a complex is indicative that the oligonucleotide includes a binding site. One or more cells are then cultured under a selection condition. Among the cells, at least one cell, and preferably a plurality of cells, contains a nucleic acid test sequence inserted in a transcriptional regulatory position to a selective gene. The test sequence consists of at least one of the binding sites identified using the cellular or nuclear extract. The capability of at least one cell to be selected in the presence of the selection condition is indicative that the nucleic acid test sequence contains a transcriptional region active in the cell. The selection condition can be adjusted so that only strong regulatory regions will be effective to be selected in the selection condition.
In addition, in another aspect, the invention provides synthetic regulatory regions which include all or portions of the synthetic regulatory regions described in Example 5 and in the Drawings. Preferably the synthetic regulatory region is in a transcriptional regulatory position with respect to a coding sequence of interest. A portion of one of the described regions preferably includes at least 20 contiguous nucleotides, more preferably at least 40 contiguous nucleotides, and still more preferably at least 80 contiguous nucleotides of one of the described synthetic regulatory regions. Preferably the portion is placed at about the same position relative to a coding sequence as it occupied in the plasmids used for analysis as described herein. Thus, the portion is preferably within 100 nucleotides, more preferably within 60 nucleotides, and still more preferably within 30 nucleotides of the position it occupied in a corresponding described synthetic regulatory region.
Other features and advantages of the invention will be apparent from the following detailed description of the invention in conjunction with the accompanying drawings and from the claims.