Expression of heterologous genes in eukaryotic cells is a fundamental aspect of biotechnology with many academic and commercial applications. Expression of such genes requires transcription by RNA polymerase II (Pol II), which is driven by cis-acting genetic elements known as promoters and enhancers.
In simple terms, promoters are directional elements that act to initiate transcription of sequences situated less than 100 (and usually less than 50) nucleotide base pairs (bp) downstream. They contain a number of short consensus nucleotide sequences that act as binding sites for various proteins that participate in the initiation of transcription and the assembly of a multi-subunit complex known as the pre-initiation complex (McKnight and Tjian, 1987, Cell 46: 795-805). In most genes, this occurs at a very widely conserved sequence known as the TATA box (TATAAA) to which the TATA box-binding protein (TBP, a subunit of the general transcription factor TFIID) binds. There follows an ordered assembly of more than ten further transcription factors to finally form the Pol II holoenzyme complex. RNA transcription actually starts at an initiator site about 25-30 bases downstream (Breathnach and Chambon, 1981, Annu Rev Biochem 50: 349-393) to which TBP also binds.
Most functional promoters contain further upstream promoter elements (UPEs), of which the most highly conserved are the CAAT box (CCAAT, the binding site for the transcription factors CBF, C/EBP and NF-1), about 70-200 bp upstream, and the GC box (GGGCGG, binding site for the general transcription factor Sp-1) a similar distance upstream. Although basal levels of transcription occur from the TATA box alone, for most promoters at least the CAAT and GC boxes are required for optimal levels of transcription.
Enhancers are sequences that act non-directionally to increase transcription from promoters situated locally but not necessarily immediately adjacent (up to several kilobases away (Kadonaga (2004) Cell 116: 247-257). Enhancers contain short (8-12 bp) consensus sequences representing the binding sites for a wide range of transcriptional activator proteins (Ondek et al, 1988, Science 236: 1237-1244) including some, such as NF-1 and SP-1 that are also associated with promoter elements. These sequences are often duplicated in tandem or inverted repeats.
In some natural transcription units, including the very active immediate/early gene transcription units of many DNA viruses such as cytomegalovirus, enhancer and promoter elements may be functionally combined into what is effectively one extended upstream element.
Promoters may be regulated, being responsive to cell type, temperature, metal ions or other factors; or constitutive, giving transcription that is unresponsive to such factors. For many purposes a strong, constitutive promoter giving consistent, high, levels of transcription in many, if not all, cell types is highly advantageous. For many years the enhancer/promoter element driving immediate/early gene expression in human cytomegalovirus has been very widely used for driving such expression of heterologous genes in eukaryotic expression vectors (Foecking & Hoffstetter, 1986, Gene 45: 101-105).
Human cytomegalovirus (CMV) is a member of the betaherpesvirus family and is responsible for gastrointestinal and respiratory infections, hepatitis, and retinitis. As with other herpesviruses, CMV can persist in latent infections and can be reactivated in immunocompromised individuals. In cell culture, human CMV replicates productively in terminally differentiated cells such as fibroblast, epithelial, and endothelial cells and in monocyte-derived macrophages (Isomura and Stinski, 2003, J Virol 77: 3602-3614 and references therein).
During productive infection, there is an ordered expression of sets of CMV genes, designated immediate-early (IE), early, or late. The human CMV IE genes are thought to play a critical role in the efficiency of replication (reviewed in Castillo and Kowalik, 2002, Gene 290: 19-34).
The region upstream of the human CMV IE promoter is divided into three regions, the modulator, the unique region, and the enhancer. The enhancer is also divided into a distal and a proximal enhancer. The distal enhancer is necessary for efficient IE gene expression and viral replication at a low MOI. Human CMVs have very strong enhancers for the expression of IE genes. The human CMV enhancer has four 18-bp repeat elements containing an NF-κB or reI binding site, five 19-bp repeat elements containing a CREB or ATF binding site, two AP-1 binding sites, and multiple SP-1 sites (Thomsen et al, 1984, Proc Natl Acad Sci USA 81: 659-663; Meier and Stinski, 1996, Intervirology 39: 331-342). The murine CMV enhancer contains six NF-κB or reI binding sites, one CREB or ATF binding site, and at least seven AP-1 binding sites (Dorsch-Hasler et al, 1985, Proc Natl Acad Sci USA 82: 8325-8329). The different cis-acting elements act individually and synergistically to stabilize the RNA polymerase II transcription initiation complex on the promoter.
A number of cytomegaloviruses predominantly infecting other host species are known, although, in many cases, the exact taxonomy and degree of cross-species relatedness is provisional. Cytomegalovirus-like viruses infecting a number of primate species (including African green monkey, Rhesus monkey and bonobo) and rodents including mouse, rat and guinea pig are recognised. Of these, only the murine and rat promoter-enhancers have been subject to detailed functional analysis. Comparison of these species with human CMV shows that the functions of the IE promoter-enhancers are not directly comparable, probably because of the presence of unrecognised cis-acting elements contributing to downstream transcription in cells of different species (Isomura and Stinski, 2003, J Virol 77: 3602-3614).
However, both human and murine CMV IE promoter-enhancers produce high levels of constitutive expression of heterologous genes in eukaryotic expression vectors and are widely used in biotechnology. Such use of the human CMV promoter was disclosed in U.S. Pat. No. 5,168, 062 (Stinski/University of Iowa). Use of the promoter, enhancer and functionally complete 5′ (upstream) untranslated region including the first intron of the human cytomegalovirus major immediate-early gene, wherein this is not directly linked to its natural DNA coding sequence is claimed by U.S. Pat. No. 5,591,639 (Bebbington/Celltech). Use of the murine CMV IE enhancer is disclosed by U.S. Pat. No. 4,968,615 (Koszinowski et al)
Guinea pig CMV (GPCMV) produces a disease of guinea pigs with many similarities to the pathology of human CMV infections. Attempts to characterise the genome (Isom et al, 1984, J Virology 49: 426-436; Gao and Isom, 1984, J Virology 52: 436-447) suggested that the structural organisation of the genome was unique amongst herpesviruses. Although of a similar size to human and murine CMV, the GPCMV genome was far simpler than that of human CMV and most closely resembled that of murine CMV. However, the GPCMV genome had several unusual features, particularly in the structure of the terminal regions. Later studies of IE gene expression identified an IE region by sequence comparison with human CMV (Yin et al, 1990, J Virol 64: 1537-1548) and the expression and processing of IE transcripts was analysed. However, there was no analysis of the usefulness of the IE promoter-enhancer for the expression of heterologous genes.
The sequence of the ‘HRv’ (Hind III-EcoRV) immediate-early upstream fragment of the GPCMV genome, containing the 5′ end of IE1 coding sequence and the upstream promoter/enhancer regions was sequenced (Yin, 1991, Guinea pig cytomegalovirus immediate-early gene expression, PhD thesis, Pennsylvania State University, USA) and shown to contain a region of repetitive sequences, typical of a CMV IE regulatory region. Three short repeats, GP-1, GP-2 and GP-3 were identified. GP-1 is an 18-bp repeat occurring 9 times (73-100% similarity to a GGCCCGGGACTTTCCA consensus) containing an NF-κB binding site and corresponding to the HCMV 18-bp repeat. GP-2 is a 17-bp repeat occurring 10 times (86-100% similarity to a TGTCCTTTTTGGCAAA consensus) and containing a core sequence similar to the consensus SRE (serum response element). GP-3 is repeated 4 times in the proximal upstream region and contains GTGACTTT, a sequence identified as a binding site for c-jun or GCN4 (Hill et al, 1984, Science 234: 451-457).
Although this work suggested that the GPCMV IE upstream region contained a strong promoter, due to the way the reporter constructs were made certain artefacts could not be excluded. Firstly, the HRv fragment also appears to include the first exon and part of the first intron of the IE1 gene. This intron contains three copies of a putative NF-1 binding site, which may have artificially boosted the apparent strength of the promoter. Secondly, the reporter constructs used to test the GPCMV fragments contained an SV40 promoter (itself a strong viral promoter), so that reporter expression resulted from the effect of a double GPCMV/SV40 promoter. As a result it is not possible to make comparisons of the GPCMV enhancer/promoter alone with other strong promoters generally, or even with other CMV IE enhancer/promoters.
The applicant's co-pending patent application PCT/GB99/02357 (WO 00/05393), incorporated by reference herein, describes elements that are responsible, in their natural chromosomal context, for establishing an open chromatin structure across a locus that consists exclusively of ubiquitously expressed, housekeeping genes. These elements are not derived from a Locus Control Region (LCR) and comprise extended methylation-free CpG islands. The term Ubiquitous Chromatin Opening Element (UCOE) has been used to describe such elements.
In mammalian DNA, the dinucleotide CpG is recognised by a DNA methyltransferase enzyme that methylates cytosine to 5-methylcytosine. However, 5-methylcytosine is unstable and is converted to thymine. As a result, CpG dinucleotides occur far less frequently than one would expect by chance. Some sections of genomic DNA nevertheless do have a frequency of CpG that is closer to that expected, and these sequences are known as “CpG islands”. As used herein a “CpG island” is defined as a sequence of DNA, of at least 200 bp, that has a GC content of at least 50% and an observed/expected CpG content ratio of at least 0.6 (i.e. a CpG dinucleotide content of at least 60% of that which would be expected by chance) (Gardiner-Green M and Frommer M. J Mol Biol 196, 261-282 (1987); Rice P, Longden I and Bleasby A Trends Genet 16, 276-277 (2000).
Methylation-free CpG islands are well-known in the art (Bird et al (1985) Cell 40: 91-99, Tazi and Bird (1990) Cell 60: 909-920) and may be defined as CpG islands where a substantial proportion of the cytosine residues are not methylated and which usually extend over the 5′ ends of two closely spaced (0.1-3 kb) divergently transcribed genes. These regions of DNA are reported to remain hypomethylated in all tissues throughout development (Wise and Pravtcheva (1999) Genomics 60: 258-271). They are often associated with the 5 ends of ubiquitously expressed genes, as well as an estimated 40% of genes showing a tissue-restricted expression profile (Antequera, F. & Bird, A. Proc. Natl. Acad. Sci. USA 90, 1195-11999 (1993); Cross, S. H. & Bird, A. P. Curr. Opin, Genet. Dev. 5, 309-314 (1995) and are known to be localised regions of active chromatin (Tazi, J. & Bird, A. Cell 60, 909-920 (1990).
An ‘extended’ methylation-free CpG island is a methylation-free CpG island that extends across a region encompassing more than one transcriptional start site and/or extends for more than 300bp and preferably more than 500 bp. The borders of the extended methylation-free CpG island are functionally defined through the use of PCR over the region in combination with restriction endonuclease enzymes whose ability to digest (cut) DNA at their recognition sequence is sensitive to the methylation status of any CpG residues that are present. One such enzyme is HpaII, which recognises and digests at the site CCGG, which is commonly found within CpG islands, but only if the central CG residues are not methylated. Therefore, PCR conducted with HpaII-digested DNA and over a region harbouring HpaII sites, does not give an amplification product due to HpaII digestion if the DNA is unmethylated. The PCR will only give an amplified product if the DNA is methylated. Therefore, beyond the methylation-free region HpaII will not digest the DNA a PCR amplified product will be observed thereby defining the boundaries of the “extended methylation-free CpG island”.
International application WO 00/05393 demonstrates that regions spanning methylation-free CpG islands encompassing dual, divergently transcribed promoters from the human TATA binding protein (TBP)/proteosome component-B1 (PSMBI) and heterogeneous nuclear ribonucleoprotein A2/B1 (hnRNPA2)/heterochromatin protein 1Hsγ (HP1Hsγ) gene loci impart enhanced levels of gene expression to operably linked genes.
Methylation-free CpG islands associated with actively transcribing promoters possess the ability to remodel chromatin and are thus thought to be a prime determinant in establishing and maintaining an open domain at housekeeping gene loci.
UCOEs confer an increased proportion of productive gene integration events with improvements in the level and stability of transgene expression. This has important research and biotechnological applications including the generation of transgenic animals and recombinant protein products in cultured cells.
WO 00/05393 discloses functional UCOE fragments of approximately 4.0 kb, in particular, the ‘5.5 RNP’ fragment defined by nucleotides 4102 to 8286 of FIG. 21 (as disclosed on p 11, lines 6 and 7). The same application discloses a ‘1.5 kb RNP’ fragment (FIGS. 22 and 29, derivation described on p 51, lines 1 to 5). However, this fragment is actually a 2165 bp BamHI-Tth111I fragment of the ‘5.5 RNP’ fragment described above, consisting of nucleotides 4102 to 6267 of FIG. 21 of that application.
A further application, WO 02/24930, discloses artificially-constructed UCOEs composed of fragments of naturally-occurring CpG islands. A third application, WO 04/067701, describes polynucleotides comprising small functional fragments of UCOEs. Such polynucleotides comprise methylation-free CpG islands of no more than approximately 2 kb, or fragments of larger such islands, of not more than approximately 2 kb.
Given the importance of recombinant protein expression in biotechnology, there remains a need for improved expression vectors comprising novel promoter/enhancer combinations.