1. Technical Field of the Invention
The present invention relates to DNA sequence elements that augment the expression of recombinant proteins in eukaryotic cells.
2. Description of the Related Art
The development of expression systems for production of recombinant proteins is important for developing a source of a given protein for research or therapeutic use. Expression systems have been developed for both prokaryotic cells, such as E. coli, and for eukaryotic cells, which includes both yeast (i.e., Saccharomyces, Pichia and Kluyveromyces spp) and mammalian cells. Expression in mammalian cells is often preferred for manufacturing of therapeutic proteins, since post-translational modifications in such expression systems are more likely to resemble those found in a mammal than the type of post-translational modifications that occur in microbial (prokaryotic) expression systems.
Transcription of eukaryotic genes is regulated by a variety of cis- and trans-acting regulatory elements (reviewed by Dillon and Grosveld, Trends Genet. 9:134; 1993). Two of the best characterized cis elements are promoters and enhancers. Promoters are DNA sequences immediately 5xe2x80x2 to the coding sequence of the gene and encompass multiple binding sites for trans-acting transcription factors, forming the basal transcription apparatus. Enhancers are also composed of multiple binding sites for trans-acting transcription factors but can be found far up stream or down stream of coding sequences or even within introns. These elements can also act in an orientation independent manner. The activities of promoters and enhancers can be detected in transient expression systems and contain elements which may or may not be tissue specific; they are vulnerable to position effects when studied in stable cell lines or transgenic animals.
Another category of cis- regulatory elements are ones which are believed to regulate the chromatin structure including, locus control regions (LCR) (Grosveld F., et al., Cell 51:975, 1987), matrix attachment regions (MAR; Phi-Van et al., Mol Cell Biol 10:2302; 1980), scaffold attachment regions (SAR; Gasser and Laemmli, Trends Genet 3:16, 1987), and insulator elements (Kellum and Schedl, Cell 64:941, 1991). These elements are similar to enhancers in that they are able to act over long distances, but are unique in that their effects are only detectable in stably transformed cell lines or transgenic animals. LCRs are also dissimilar to enhancers in that they are position and orientation dependent, and are active in a tissue specific manner. In addition, LCR and SAR sequences are characterized by A boxes, T boxes and topoisomerase II sites, which are not typically found in enhancer or promoter sequences. (Gasser and Laemmli, supra; Klehr D., et al., Biochemistry 30:1264, 1991).
Internal ribosome entry sites (IRES) are another type of regulatory element that can be found in several viruses and cellular RNAs (reviewed in McBratney et. al. Current Opinion in Cell Biology 5:961, 1993). IRES are useful in enhancing translation of a second gene product in a bicistronic eukaryotic expression cassette (Kaufman R. J., et al., Nucleic Acids Res 19:4485, 1991).
Another type of regulatory element is the HMG-I(Y) family. The HMG-I(Y) family of xe2x80x9chigh mobility groupxe2x80x9d nonhistone chromatin proteins are founding members of a new category of mammalian gene trans-regulatory proteins called xe2x80x9carchitectural transcription factorsxe2x80x9d (Grosschedl, et al., Trends Genet. 10:94-100 (1994); Bustin and Reeves, Prog. Nucleic Acid Res. Mol. Biol. 54:35-100 (1996)). In contrast to most transcription factors that bind to specific nucleotide recognition sites in the major groove, architectural transcription factors are characterized by their ability to recognize and modulate DNA and chromatin structure and typically bind to the minor groove of DNA substrates. The HMG-I(Y) family consists of three closely related proteins, HMG-I, HMG-Y and HMG-IC. Each possess three independent DNA-binding domains called xe2x80x9cA.T-hooksxe2x80x9d because of their ability to recognize and bind to the narrow minor groove of stretches of A.T-rich nucleotides. A.T-hooks also recognize distorted DNA structures such as those present on synthetic four-way junctions (Hill and Reeves, Nucleic Acids Res. 25:3523-31 (1997)), Hill et al., Nucleic Acids Res. 27:2135-44 (1999)), supercoiled plasmids (Nissen and Reeves, J Biol. Chem. 270:4344-4360 (1995)), and the surface of nucleosome core particles (Reeves and Wolffe, Biochemistry 35:5063-74 (1996)).
Several vectors are available for expression in mammalian hosts, each containing various combinations of cis- and in some cases trans- regulatory elements to achieve high levels of recombinant protein in a minimal time frame. However, despite the availability of numerous such vectors, the level of expression of a recombinant protein achieved in mammalian systems is often lower than that obtained with a microbial expression system. Moreover, developing a transformed cell line that expresses high levels of a desired protein often requires time consuming cloning and amplification. Accordingly, there is a need in the art to refine and improve expression in mammalian cells, and to identify elements that can augment expression of recombinant proteins and facilitate the use of mammalian cells in recombinant protein production.
Novel regulatory sequences, expression augmenting sequence elements (EASE), that facilitate high expression of recombinant proteins in mammalian host cells in a short time period, are disclosed. One embodiment of the invention is an expression augmenting sequence element (EASE), that facilitates high expression of recombinant proteins in mammalian host cells in a short time period, which is not active in transient expression systems, does not exhibit characteristics of DNAs that encode a protein, and does not exhibit nucleotide sequence characteristics found in LCR, MAR or SAR such as clusters of A and T boxes and topoisomerase II sites. The instant invention may contain certain putative MAR as defined by Singh et al. (Nucleic Acids Res. 25:1419-25(1997). A preferred embodiment of the invention is an EASE that was obtained from Chinese hamster ovary (CHO) cell genomic DNA, proximal to a unique integration site for a recombinant mammalian protein.
In a preferred embodiment of the invention, the EASE is selected from the group consisting of DNAs comprising nucleotides 46 through 14507 of a nucleotide sequence set forth in SEQ ID NO:1, nucleotides 5980 through 14507 of a nucleotide sequence set forth in SEQ ID NO:1, nucleotides 8671 through 14507 of the nucleotide sequence set forth in SEQ ID NO:1, nucleotides 8673 through 12274 of the nucleotide sequence set forth in SEQ ID NO:1, nucleotides 8671 through 10516 ligated to nucleotides 12592 through 14507 of the nucleotide sequence set forth in SEQ ID NO:1, nucleotides 8671 through 10516 ligated to nucleotides 14291 through 14507 of the nucleotide sequence set forth in SEQ ID NO:1, nucleotides 9277 through 10516 ligated to nucleotides 14291 through 14507 of the nucleotide sequence set forth in SEQ ID NO:1, fragments of the foregoing DNAs that have expression augmenting activity, DNAs complementary to the forgoing DNAs, and combinations of the foregoing DNAs that have expression augmenting activity.
Particularly preferred embodiments comprise EASE sequences from the box III region, for example, the present invention provides EASE sequences selected from the group consisting of DNAs comprising nucleotides 11538 through 12165 of a nucleotide sequence set forth in SEQ ID NO: 1, nucleotides 11538 through 11692 of a nucleotide sequence set forth in SEQ ID NO:1, and nucleotides 11813 through 12165 of a nucleotide sequence set forth in SEQ ID NO:1, as well as a ligated form of nucleotides 11538 through 11692 and nucleotides 11813 through 12165, which is herein referred to as EASE45. Additionally, EASE sequences may be selected from the group consisting of DNAs comprising nucleotides 11538 through 11760 of a nucleotide sequence set forth in SEQ ID NO:1, and nucleotides 11899 through 12165 of a nucleotide sequence set forth in SEQ ID NO:1, as well as a ligated form of nucleotides 11538 through 11760 and nucleotides 11899 through 12165, which is herein referred to as EASE12. In yet further embodiments, EASE sequences may comprise nucleotides 11673 through 12165 of a nucleotide sequence set forth in SEQ ID NO:1, which is herein referred to as EASE3. Of course, it is understood that any or all of the aforementioned EASE sequences may be used alone or in any combination.
Expression vectors comprising the novel EASE are able to transform CHO cells to high expression of recombinant proteins. Thus, another embodiment of the invention is an expression vector comprising an EASE. In a preferred embodiment, the expression vector further comprises an eukaryotic promoter/enhancer driving the expression of a protein of interest. In a most preferred embodiment, the expression vector consists of a bicistronic plasmid wherein a first exon encodes the gene of interest and a second exon encodes an amplifiable dominant selectable marker. A preferred marker is dihydrofolate reductase (DHFR); other amplifiable markers are also suitable for use in the inventive expression vectors. The expression vector may further comprise an IRES sequence between the two exons.
Mammalian host cells can be transformed with the inventive expression vectors, and will produce high levels of recombinant protein in a short period of time. Accordingly, another embodiment of the invention provides a mammalian host cell transformed with the inventive expression vector. In a most preferred embodiment, the host cells are CHO cells.
The invention also provides a method for obtaining a recombinant protein, comprising transforming a host cell with an inventive expression vector, culturing the transformed host cell under conditions promoting expression of the protein, and recovering the protein. In a preferred application of this invention, transformed host cell lines are selected with two selection steps, the first to select for cells expressing the dominant amplifiable marker, and the second step for high expression levels and/or amplification of the marker gene as well as the gene of interest. In a most preferred embodiment, the selection or amplification agent is methotrexate, an inhibitor of DHFR that has been shown to cause amplification of endogenous DHFR genes and transfected DHFR sequences.
Moreover, the invention provides a method of identifying additional expression augmenting sequence elements, for example, from other transformed cell lines. Such cell lines will exhibit high levels of expression that are not attributable to high gene copy number. The inventive techniques will be useful in identifying and isolating such EASE, as well as EASE present in non-transformed cells (for example, by hybridization studies or sequence analysis).
Further, high levels of EASE activity have been connected to sequences containing higher numbers of HMGI-(Y) binding sites. Thus, this invention provides a method of determining expression augmenting sequence elements by identifying sequences which contain high levels of HMGI-(Y) binding sites.