This invention is directed to novel DNA sequences and to cloning vectors (vehicles) useful in the production of protein products.
Masayori Inouye and various of his co-workers have carried out extensive studies involving, gene sequences coding for outer membrane proteins of gram-negative bacteria, in particular, the lipoprotein. These investigations have demonstrated that lipoproteins are present in relatively large quantities in bacterial cells. For example there are approximately 7.2.times.10.sup.5 molecules of the lipoprotein of the Escherichia coli outer membrane per cell. Moreover, since it appears that there is only one structural gene for the lipoprotein in the E. coli chromosome, its transcription machinery must be highly efficient.
Recent efforts of Inouye and associates have been directed to expression of lipoprotein using appropriately formulated plasmids in suitably transformed microorganisms and to determining and analyzing DNA sequences of various lipoprotein genes (lpp). Thus, in Nakamura and Inouye, Cell 18, 1109-1117 (1979), the DNA sequence for the outer membrane lipoprotein of E. coli is reported. An analysis of the promoter region of this sequence demonstrated some interesting features. First, it was noted that the segment of 261 base pairs (bp) preceding the transcription initiation site (-1 to -261) has a very high AT content (70%) in contrast to 53% for the 322 bp mRNA region, 44% for the segment of 127 bp after the transcription termination site and and 49% for the average AT content of the E. coli chromosome. Secondly, it was noted that the first 45 bp upstream from the transcription initiation site (-1 to -45) contained 36 bases (80%) which are A or T. Thirdly, a heptanucleotide sequence analogous to the "Pribnow box" is present eight bases from the transcription initiation site. Fourthly, a sequence analogous to the " RNA polymerase recognition site" is present on both strands between positions -27 and -39. Fifthly, a long dyad symmetry is centered at the transcription initiation site.
It is postulated by Inouye and associates that these features either separately or in combination are responsible for the high degree of lpp promoter strength In particular, it is postulated that the high AT content in the promoter sequence tends to destabilize the helix structure of the DNA and thereby facilitates strand unwinding that is essential for initiation of transcription.
The Inouye group further has shown that a high degree of homology exists with respect to lipoprotein gene sequences of other, perhaps all, gram-negative bacteria. Thus, an analysis of the DNA sequence of the Serratia marcescens lipoprotein gene and comparison with that of the E. coli lpp gene shows a high degree of homology. [Nakamura and Inouye, Proc. Natl. Acad. Sci. U.S.A. 77, 1369-1373 (1980)]. In particular, they showed that the promoter region is highly conserved (84% homology), having an extremely high A and T content (78%) just as in E. coli (80%). Moreover, the 5' untranslated region of the lipoprotein mRNA is also highly conserved (95% homology).
More recently, in Yamagata, Nakamura, and Inouye, J. Biol. Chem. 256, 2194-2198 (1981), the DNA sequence of the lipoprotein gene of Erwinia amylovora was analyzed and compared with those of E. coli and S. marcescens. This study again confirms the high degree of homology existing in the lpp genes. Thus, the promoter region (-45 to -1) is highly conserved (87% relative to E. coli and 93% to S. marcescens). An extremely high A and T content (80%) exists, just as in E. coli (80%) and S. marcescens (78%). Moreover, the sequence of the untranslated region of the mRNA is highly conserved (97% relative to E. coli and 92% to S. marcescens).
The high level of constitutive transcription observed for the lipoprotein gene, based upon Inouye's studies, recommends it as a vehicle for expression of exogenous DNA fragments. Moreover, the work of Inouye et al. suggest that any of a wide range of lipoprotein genes of gram-negative bacteria may be so employed, including, for example, Escherichia coli, Shigella dysenteriae, Salmonella typhimurium, Citrobacter freundii, Klebsiella aerogenes, Enterobacter aerogenes, Edwardsiella tarda, Erwinia amylovora, Serratia marcescens, and the like.
Most recently, the suitability of the lipoprotein gene for product expression has been demonstrated by Inouye et al. (C. Lee, Nakamura, and Inouye, J. Bacter. 146, 861-866 (1981). In this work the S. marcescens lipoprotein gene was cloned in a lambda phage vector and then recloned in plasmid vectors pBR322 and pSC101. Both vectors carrying the S. marcescens lpp gene were used to transform E. coli cells. The evidence establishes normal expression, albeit at a level somewhat reduced relative to vectors containing the E. coli lpp gene. In any event, it has been established in the literature that vectors containing the lpp gene promoter and 5' untranslated regions can be employed to achieve significant levels of lipoprotein expression.
By the term "vector" as used herein is meant a plasmid, phage DNA, or other DNA sequence (1) that is able to replicate in a host cell, (2) that is able to transform a host cell, and (3) that contains a marker suitable for use in identifying transformed cells.
It is to a specific class of cloning vectors that this invention is directed. It has been discovered that significantly high levels of expression of exogenous protein can be achieved using cloning vectors constructed to contain, in tandem, a nucleotide sequence defining the lipoprotein promoter region, a nucleotide sequence defining the lipoprotein 5' untranslated region, and a sequence coding for an exogenous protein product, the sequence coding for such product being connected via a translation start signal codon and a nucleotide sequence coding for an enterokinase cleavage site to the 3' terminal of the 5' untranslated region of the lipoprotein gene. Cloning vectors containing such elements therefore represent the subject matter of this invention.