Protein production in Escherichia coli is a fundamental activity for a large fraction of academic, pharmaceutical and industrial research laboratories. Maximum production levels are usually sought, as this reduces costs and facilitates downstream purification steps.
A rate-limiting step in protein synthesis is translation initiation. In this process the ribosome must bind to the mRNA at a region called the Ribosome Binding Site (RBS). The RBS is approximately 35 nucleotides long and contains three discrete domains; (1) the Shine-Dalgarno (SD) sequence, (2) a spacer region, and (3) the first five to six codons of the Coding Sequence (CDS) (FIG. 1).
To enable maximum production levels, coding sequences (CDSs) to be expressed recombinantly are typically cloned into high-copy number vectors that contain optimised genetic elements, such as promoters selected for high-level transcription and Shine-Dalgarno sequences selected for efficient translation.
When a CDS is cloned into an expression vector, the natural 5′ untranslated region (UTR) is typically replaced by a vector-derived 5′UTR and a new RBS is formed. The nature of this RBS is dictated by the newly formed junction between the vector and the CDS. Thus a vector that works for one coding sequence might not work for another1,2. This problem is known as context dependence and it results in unpredictable expression levels.
It has been noted that nucleotide changes in the RBS can modulate protein production. Prior art in this area include a number of studies that have noted that nucleotide changes in the spacer3,4 can have an influence on protein production. It has also been noted that nucleotide changes in the 5′-end of the CDS can have an influence on protein production5-9. Moreover, there is also a body of work that shows that inserting domains into the RBS can influence the levels of protein production. Whilst these studies demonstrate how nucleotide changes in the RBS affect protein production, they do not solve the problem of context dependence, because they do not consider the vector and the CDS at the same time.
The best available tool for designing an optimal RBS is the RBS-calculator2. This calculator is a prediction tool that uses a thermodynamic model of translational initiation (i.e. free-energy or ΔG calculations) to design a SD and linker region that is most optimal for maximum production of the CDS of interest. It therefore considers context. However, the calculator is based on a bioinformatics prediction and its reliability is calculated to be around 47% (within 2-two-fold of a target expression level). The calculator does not consider synonymous codon choice in the 5′end of the CDS.
The method disclosed in W02009112587A3, like the RBS calculator, matches a SD and linker region with a CDS of interest. The described solution uses a two-step cloning approach that modulates the nucleotide sequence of the SD and the linker, but not the CDS. The increases in protein production observed are however marginal compared to those observed with the present method.
The problem of unpredictable and often undesirably low expression level is of fundamental nature and has been recognized already in the early days of molecular biology, but a general experimental solution has not been presented. Therefore, an object of the present invention is to provide an improved experimental method for optimizing the RBS of a DNA construct for recombinant protein expression.