The 1980's and 1990's have witnessed a burgeoning science in the area of recombinant DNA. Recombinant DNA processes involve the identification and isolation of desired gene sequences from natural sources. Various methods and biochemical tools have been developed in order to specifically incorporate such sequences into in vitro expression systems. Such systems are intended to produce large amounts of the protein product coded for by the gene sequences.
Expression of such genes in bacterial expression systems is widely used for research applications, but suffers from a number of drawbacks when applied to the production of gene products such as therapeutic proteins. In particular, it has been found that the protein product of bacterial expression systems does not undergo certain biochemical modifications thought to be necessary for activity of the product as a therapeutic agent. As a result, numerous efforts have been directed at the development of mammalian expression systems. It is generally thought that such expression systems are more likely to result in protein products suitably modified for mammalian use.
Expression in either bacterial (procaryotic) or mammalian (eucaryotic) systems typically involves the incorporation of the gene of choice into a vector, e.g., a plasmid. The vector, or expression construct as it is often referred, is then introduced into a host cell in such a manner as to enable the host cell to transcribe the gene of choice and thereby produce protein. High level expression is typically a prime concern in developing such systems. See generally, pages 16.3-16.28 "Expression of Proteins" in Chapter 16, "Expression of Cloned Genes in Cultured Mammalian Cells, in Molecular Cloning--Laboratory Manual, J. Sambrook ed, 2nd ed. (1989).
Transcriptional control regions occur throughout eucaryotic genes. Such genes can, in fact, be divided into three classes on the basis of the specific RNA polymerase that transcribes them into RNA. In particular, specific nucleotides that flank the gene coding region are common among expression systems. A common sequence, likely to be important for proper transcription by RNA polymerase II, is known as the "TATA" box, which occurs about 30 bp from the transcriptional start site. Other conserved sequences have been found roughly 50 to 100 base pairs ("bp") upstream of the start site, among them being a GC-rich sequence and the sequence CCAAT. These sequences provide the recognition sites for specific proteins that serve as transcription factors.
A typical mammalian expression construct consists of a complimentary DNA sequence (otherwise known as a "cDNA" or mini-gene) functionally linked to a promoter region and to a polyA signal. A promoter region is a nucleotide segment that is recognized by a RNA polymerase molecule, in order to begin RNA synthesis (i.e., transcription). In the initiation step the two chains of the nucleic acid double helix come apart, with only one of the two strands at any start site being copied into its RNA complement. The promoter region is itself derived from a viral or mammalian source (constitutive or inducible). These elements are analogous to those necessary for propagation in bacteria. Most mammalian expression constructs also contain a gene useful for selection in mammalian cells.
Ideally, the selection gene is capable of being transformed with the cDNA in order to provide an easily detectable protein product. The presence of the detectable protein product is used to identify transfected cells that have successfully incorporated the construct. The selectable gene is commonly of bacterial origin and is itself usually flanked by a constitutive mammalian promoter and polyA signal. The promoter driving the selectable marker is often of the same type as the promoter used to express the cDNA. This is desirable since vigorous expression of the selectable gene is considered necessary to establish permanent cell lines. See, for instance, R. J. Kaufman, "Selection and Coamplification of Heterologous Genes in Mammalian Cells", in Methods in Enzymology, 185:537-566 (1990).
Gene expression in eucaryotic systems carries its own unique attributes and considerations. In prokaryotes, mixing purified RNA polymerase with a template having a promoter region and the necessary reagents and buffers is generally sufficient to obtain specific gene transcription.
Purified eucaryotic RNA polymerase, however, initiates transcription very poorly in vitro, by a process that is essentially random. It is believed that a multiprotein transcription complex is assembled in eucaryotic systems, in order to enable the polymerase to bind to a promoter. The assembly includes both gene-specific and general factors. An example of a general factor is a protein called TFIID, which binds to the above-described "TATA" sequence, common to many promoters.
Expression of recombinant genes in mammalian cells is typically performed in one of two ways. The first approach involves the temporary introduction of DNA into a host cell, under conditions in which the protein is transiently expressed on a short term basis. A common example is the infection of COS cells with simian vacuolating virus 40 ("SV40")based vectors, where the SV40 origin of replication produces manifold copies of the expression vector. A selectable gene is not considered necessary for transient expression.
A second approach to mammalian expression involves the establishment of a permanent mammalian cell line by the stable integration of an expression construct, usually at random, in the host cell genome. The cell line most often used in the past for stable expression has been the Chinese Hamster Ovary ("CHO") cell line, which is a fibroblast-derived cell line. The expression levels for stable CHO cell lines derived from a single transfection can vary from undetectable levels to levels as high as 0.05 micrograms/ml and more. The variation between cell lines is largely a factor of the respective insertion sites for the construct, and/or position effects.
A number of factors can influence the expression of cDNA constructs in stable cell lines. Among the more important factors appears to be the location, i.e., site of insertion, of the construct within the host cell genome. Expression variability that results from the different insertion sites is often referred to as the "position effect". The integration techniques commonly employed result in random positioning, meaning that position effects can be detected but not controlled or predicted.
In this second approach, the expression of lines providing detectable expression levels can be increased further by a process known as gene amplification. Such a process involves the stepwise selection for growth of cultured cells in the presence of increasing concentrations of a substance toxic to the cells. The toxic substance, in turn, can only be inactivated by a corresponding increase in the expression of a gene product that is co-transfected with the expression construct. (R. J. Kaufman, Methods in Enzymology, 185:537-566 (1990)).
A long time course is typically associated with the use of such an amplification process. The time course is lengthened even further by virtue of the relatively slow growth rate of CHO cells and by the need to isolate clonal cell lines at each step. As a result, the characterization of different variants is a long and tedious process, often taking six to nine months. Using CHO cells, for instance, the entire process can often require on the order of up to one year or more to achieve optimized levels.
Integration of a construct into a chromosomal site that is transcriptionally active, together with the use of a strong promoter, can often produce cell lines having expression levels that are comparable to those achieved by the amplification of a low expression cell line. Oftentimes the additional screening necessary to identify clones having natural high expression is so laborious, that amplification of a lower expression cell line is preferred.
It is presently quite difficult to specifically and reproducibly integrate a construct into a transcriptionally active site. It is for this reason that research to date has generally focused on the search for stronger promoters. The strength of the promoter used in the expression of the cDNA is considered one of the more important aspects of mammalian expression.
"Strength", in this respect, refers to the ability of the promoter to activate high level expression of its respective gene in a particular system. Some of the earliest promoters characterized for mammalian expression were of viral origin, e.g., SV40 (early and late promoters), the adenovirus major late promoter, the Rous Sarcoma Virus ("RSV") promoter, Cytomegalovirus ("CMV") immediate-early promoter, and the Major-intermediate-early ("MIE") promoters.
As described above, the process of identifying transfected cells has been fostered by the development of selectable markers that are capable of being co-transfected with the gene of interest. Selectable markers commonly employed for the establishment of stable cell lines include various mammalian genes (such as dhfr), as well as bacterial genes, e.g., Neo (G418 resistance), and the E. coli gpt gene, driven by a mammalian promoter. (See Kaufman, above).
A mammalian cell typically obtains its supply of GMP, which is a necessary purine nucleotide, either by de novo synthesis from IMP or by salvaging guanine from the culture medium. Guanine salvage can be blocked by using cells that lack hypoxanthine-guanine phosphoribosyltransferase (HPRT), leaving synthesis as the only pathway. Mycophenolic acid, when present in the growth medium, blocks the natural conversion of IMP into XMP, by inhibiting IMP dehydrogenase, and therefore inhibits the de novo synthesis of GMP.
The gpt (guanine phosphoribosyl transferase) gene can be used for selection in the presence of mycophenolic acid ("MPA"). The use of the gpt gene as a selectable marker in mammalian cells was first developed by Mulligan and Berg (Proc. Natl. Acad. Sci., 78:2072-2076 (1981)). The gpt gene can be used as a dominant selection system that can be applied to any type of HPRT (negative) cell.
In use, only cells expressing the E. coli gpt gene are able to use xanthine to make XMP and the GMP, and cells that do not express gpt do not survive. Vectors expressing gpt, when integrated into the genome, are therefore able to provide wild-type mammalian cells with the ability to grow in medium containing adenine, xanthine, and the inhibitor mycophenolic acid. The selection can be made more efficient by the addition of aminopterin, which blocks the endogenous pathway of purine biosynthesis. (See, for instance, M. Pauly, et al., Nucleic Acids Research, 20:975-982 (1992).
An optional route to the use of selectable genes such as gpt, is to incorporate antibiotic resistance into the transfected cells. Resistance, however, usually shows a threshold effect, in that a minimal concentration is needed to inhibit wild type cells. Varying levels of antibiotic might also affect the plating efficiency of each cell line as the minimal or maximum levels are approached. The advantage of the gpt selection is that adjustment of the mycophenolic acid concentration is not necessary for different selectable markers and promoters.
Most of the efforts that have been aimed at improving expression levels have therefore focused on increasing the strength of the promoter and/or on gene amplification schemes. Other approaches to circumventing position effects have involved the creation of dicistronic vectors where the selectable marker is positioned as the second gene and is inefficiently translated. See generally, R. J. Kaufman, "Vectors Used or Expression in Mammalian Cells" in Methods in Enzymology, 185:487-511 (1990).
Although present techniques for evaluating and optimizing expression levels are useful, to this day they largely remain time consuming, laborious, expensive and unpredictable. It would be highly desirable to have a system for generating and screening constructs in a manner that provides an improved combination of such aspects as time, labor and cost.