Quantitative description of the factors that determine protein expression levels is central to understanding natural systems, design of synthetic systems, and the biotechnology of heterologous gene expression. Protein expression is a complex, multi-step process involving transcription, mRNA stability, translation, post-translational processing, and physical and biological protein stability. Although much of the information controlling expression levels is encoded in the untranslated regions of bacterial genes, sequence variation in the open reading frames (ORFs) also can have profound effects on protein expression levels. Within the context of a constant amino acid sequence considerable nucleic acid sequence variation can be achieved altering four factors: nucleotide composition, levels of RNA secondary structure, codon identity, and the presence of or absence of recognition sequences for stimulatory or inhibitory factors.
Even though considerable experimental and bioinformatic evidence has accumulated for the importance of each of these factors, it has not been possible to distill a unifying quantitative description. Systematic, simultaneous examination of these factors is difficult, because of the challenges in constructing the requisite large number of isocoding sequences. A recent study of the heterologous expression levels in a combinatorial library green fluorescent protein determined that high-frequency codon choice was not the dominant factor, but rather that the degree of secondary structure in the ribosome binding site at the 5′ region of the ORF was inversely related to the expression levels. See Kudla et al., Science 324: 255-8 (2009). Even so, this study identified a quantitative link between this factor and protein expression levels for only ˜50% of the experimentally observed population.
Biomolecular function is most often the consequence of interactions between molecules (enzymes with substrates, inhibitors, or activators; receptors with ligands; protein-protein networks; protein-DNA; protein-RNA). Such functional interactions affect protein stability by virtue of a thermodynamic linkage relationship between the Gibbs free energy of folding (ΔGfold) and the free energy ligand binding (ΔGbind) to the native (N) or denatured (D) states:
Macromolecular stability is therefore one of the fundamental thermodynamic measures in biochemistry as it can quantitatively report on structure-function relationships and provide a universal monitor for biochemical function.
There are two distinct approaches for determining protein stability. The first measures the free energy of protein folding/unfolding (hereinafter “(un)folding”) under equilibrium conditions by assessing the fraction of the native state using spectroscopy, hydrodynamic observations, functional assays, or calorimetry. The second exploits the relationship between protein dynamics and stability by monitoring the differential reactivity of internal chemical groups in native and unfolded states. This approach measures conformational free energies which, under appropriate conditions, correspond to global protein stability. Amide proton exchange is used commonly to monitor such differential reactivity, but its widespread use to assess biological function is often limited by the need for specialized instrumentation and relatively large amounts of protein. Recently, cysteine reactivity has emerged as another means to determine rates of protein (un)folding and estimate protein stabilities. See, e.g., Ha et al., Nat. Struct. Biol. 5: 730-737 (1998); Feng et al., J. Mol. Biol. 314: 153-166 (2001); Sridevi et al., Biochemistry 41: 1568-1578 (2002); Jha et al., J. Biol. Chem. 282: 37479-37491 (2007); and Silverman et al., J. Mol. Biol. 324: 1031-1040 (2002). Nevertheless, existing methods fail to employ the high-sensitivity methods that are required for miniaturization of the assays (e.g. using protein at picomole levels), do not fully develop the theory that establishes at what temperatures the linkage between cysteine reactivity and protein stability is valid, nor do they present how linkage between stability and ligand binding can be established to determine affinities using these methods. Given the state of the art; additional methods for improving and optimizing protein expression and for assessing protein stability such as allowing for use of small quantities (e.g., picomoles) of protein are needed.