Recombinant DNA techniques have made it possible to prepare proteins and polypeptides, such as interferons and various hormones, that were heretofore unavailable in significant amounts, by culturing host cells transformed with a DNA sequence coding for those proteins or polypeptides and isolating the produced protein. Although much progress has been made in obtaining significant amounts of proteins relatively inexpensively, there is considerable room for improvement.
The level of production of a protein in a host cell is governed by three major factors: the number of copies of its gene within the cell, the efficiency with which those gene copies are transcribed and the efficiency with which the resultant messenger RNA ("mRNA") is translated. Optimization of each of these factors is desirable if a protein is to be made available at reasonable cost.
An expression system for producing a desired protein usually consists of seven basic elements:
(i) A recombinant DNA molecule (e.g., a plasmid) containing a region necessary for stable replication and copy number control (the replicon region).
(ii) A selectable marker such as a gene conferring antibiotic resistance to the host.
(iii) A promoter for transcription initiation and control.
(iv) A ribosome binding site for translation initiation at the appropriate ATG triplet (trinucleotide) sequence.
(v) DNA sequences compatible with efficient translation of mRNA.
(vi) An appropriate host.
(vii) Appropriate growth conditions.
Each of these elements, both independently and in connection with the others, can affect expression. For example, the properties of the protein to be expressed, such as the size, the number of cysteines, the folding properties and the solubility of the protein in the environment of the host or the media, may have a significant effect on the functioning of an expression system. Even the antibiotic resistance marker, which might be expected to have very little to do with the level of expression of an unrelated gene product, can lead to plasmid instability and ultimately affect expression. Also, since a DNA sequence coding for a desired protein, and the mRNA sequence derived therefrom, will generally code for a large number (e.g., more than 100) of amino acids in sequence, the efficiency of translation of mRNA will have a substantial effect on expression. For the foregoing reasons, it is important to design the construction of the expression system with the best combination of the aforementioned seven elements.
Efficiency of transcription and translation (which together comprise expression) is in part dependent upon the nucleotide sequences which are normally situated ahead (upstream) of the desired coding sequence or gene, those sequences within gene coding sequences and those sequences following (downstream) the desired coding sequences. For example, the upstream nucleotide sequences or expression control sequences define, inter alia, he location at which RNA polymerase interacts (the promoter sequence) to initiate transcription of mRNA and at which ribosomes bind (the ribosome binding site) and interact with the mRNA (the product of transcription) to initiate translation. Sequences of the coding sequence and downstream of it may also modulate the level of expression, presumably because of secondary structures formed in the mRNA.