Recent years have seen the development of a number of methods designed to allow the incorporation of unnatural amino acids into proteins. These approaches include in vitro protein expression, site-specific protein modification and protein total synthesis. Although powerful, each of these techniques has associated with it certain practical or synthetic limitations which have to some extent restricted their widespread application. Total chemical synthesis, which provides unparalleled freedom to manipulate protein structure, has been dominated in recent years by the use of chemical ligation techniques (25-31). Among these, Kent's “native-chemical ligation” approach has proven a particularly powerful route to synthetic proteins (32). In this process, an N-terminal cysteine-containing peptide is chemically ligated to a peptide possessing a C-terminal thioester group with the resultant formation of a peptide bond at the ligation site. Despite the generality of the ligation chemistry, the strategy has been constrained by the need to generate the peptide building blocks using stepwise solid phase peptide synthesis (SPPS). The size limitations imposed by this requirement has restricted the application of native chemical ligation to the study of small proteins and protein domains.
Protein semi-synthesis, in which synthetic peptides and protein cleavage fragments are linked together, offers an attractive route to the generation of large protein analogs containing unnatural amino acids (33). The utility of existing semi-synthesis strategies is, however, tempered by the need to have unique chemical or enzymatic cleavage sites at the appropriate position within the protein of interest. A more general protein semi-synthesis approach in which synthetic peptides are directly chemically ligated to a recombinant protein without the need to carry out such initial fragmentation steps would be useful. Central to this strategy would be the ability to generate recombinant proteins bearing C-terminal α-thioesters, thereby facilitating the use of native chemical ligation.
The ability to alter protein structure and function by the insertion of unnatural amino acids has great potential to enhance our understanding of proteins, generate new tools for biomedical research, and create novel therapeutic agents. The current challenge was therefore to devise a method of generating the requisite α-thioester group in recombinant proteins.
Protein splicing, the process in which a protein undergoes an intramolecular rearrangement resulting in the extrusion of an internal sequence (intein) and the joining of the lateral sequences (exteins), has been shown to involve the intermediacy of a thioester (7, 8). A mutant version of the splicing protein has been demonstrated to be defective in completion of the splicing reaction but still capable of thioester intermediate formation (7, 8). The commercially available IMPACT™ (such as type vectors pCYB and pTYB vectors for E. coli protein expression result in the generation of α-thioesters where a protein of interest can be expressed in frame fused with an intein-chitin binding domain (CBD) sequence (8). In the standard experiment, the protein of interest is cleaved from the intein-CBD with dithiothreitol or mercaptoethanol by a transthioesterification reaction while the chimera is bound to a chitin column.
Many large cellular and extracellular proteins are composed of independently folded protein modules with distinct biochemical properties of each, specific recombinations of which provide the overall functional character of the complete protein in vivo (1, 2). Consequently, there is interest in understanding the structural and functional interplay that occurs between such domains in the context of the multi-domain protein. Experimentally, this can be achieved by manipulating the spatial and functional organization of the domains using standard recombinant DNA techniques. An alternative protein engineering strategy would involve the in vitro assembly of multidomain proteins from individual ‘off-the-shelf’ protein domains. Advantages include, the ability to prepare a large number of chimeric proteins from a small number of pre-made building-blocks, the ability to prepare fused proteins which are cytotoxic from individually expressed domains which are not, the potential incorporation of non-natural residues in an efficient combination of in vivo and chemosynthetic approaches, and the labeling of one segment of a protein for structural or biochemical investigation.
For a protein of length n residues, there is a limit of practicality for structure determination in solution by Nuclear Magnetic Resonance NMR spectroscopy (3). This is due to the loss of resolution of signals from both increased line widths at longer rotational correlation times, and from the increased number of signals of similar chemical shift overlapping with each other. Both effects are proportional to n.
Isotopic labeling can be used for the selection of coupled nuclei pairs, the perturbation of relaxation of complex or isochronous spin systems, and for the observation of low sensitivity nuclei (specifically 13C and 15N). Its application to proteins is well exploited (e.g. (4, 5)). While early examples of highly tailored isotopic syntheses of peptides by chemical means (e.g. (6)) were useful, that approach was subsumed by the more general ability to uniformly label proteins by over-expression in isotopically substituted media. However, labeling, a segment of protein remains an important goal generally, and especially in connection with the study of multi-domain or modular proteins (e.g. (7, 8)). Labeling, a segment permits the assignment of that segment in a direct manner, because of the reduced spectral complexity. Moreover, in cases where the subdomains are individually folded, segmental labeling permits the structural determination of the independent segment, and possible comparison of the structure in isolated and multi-domain forms. Segmental labeling also permits simplified observation of the individual subdomain for spin relaxation, residual dipolar coupling analysis (9), or study of ligand binding by chemical shift perturbation/SAR-by NMR (10).
In principle, selectively labeled proteins can be obtained by joining labeled and unlabeled recombinant proteins together in vitro. Along these lines, Yamazaki et at exploited protein splicing in trans (11-13) to generate a segmentally labeled protein for NMR analysis (14). Using a genetically dissected protein splicing system, they were able to hook together labeled and unlabeled peptides derived from the α-subunit of E. coli RNA polymerase. Although elegant, this strategy resulted in the insertion of five unwanted amino acids at the splice junction, and required a chemical denaturation step. These features, alone, with the moderate yields often associated with the trans-splicing, process (11) reduces the general applicability of this approach.
Accordingly, ligation of native expressed recombinant proteins, protein domains and protein segments is therefore highly desirable as is domain and segmental protein labeling. Such applications are particularly useful in NMR.
The novel protein engineering, approach for expressed protein ligation described herein allows synthetic peptides to be chemically ligated to the C-terminus of recombinant proteins through a normal peptide bond (15, 16). Briefly, the recombinant protein to be ligated is first expressed as a N-terminal intein-CBD fusion, where the intein is a modified protein splicing element (17) and CBD is a chitin binding domain. Other affinity binding domains may be used. Following affinity purification on chitin beads, the immobilized fusion Protein is exposed to an aqueous Solution containing the synthetic peptide and a catalytic amount of thiophenol at pH 7.0. Under these conditions near quantitative ligation of the peptide to the protein is observed (15, 16). Expressed protein ligation is useful to generate semi-synthetic proteins (15, 16, 18), to facilitate two recombinant, folded proteins to be ligated together. Such an extension permits segmental isotopic labeling, and with multi-domain proteins for use in multidimensional NMR analysis. In addition, expressed protein ligation has uses in combinatorial chemistry with protein domains.
High throughput screening is a highly desirable and well-described approach for both diagnostic screening and for identification of novel, useful compounds for treatment of various ailments and diseases. High throughput screens require easy robotic manipulation, small sample size and rapid processing capabilities. Generally, such screens require binding of the sample to a solid phase support. One problem associated with such high-throughput systems is the tendency of the bound sample to diffuse in space with time unless physically delimited such as in Asample wells.@ Alternatively, rigorous washing conditions, necessary to ensure screening specificity tends to reduce or eliminate the screening signal. The protein chip compositions described herein solve this problem and provide a stable means for high-throughput diagnostic screening for the presence of proteins, antigens and antibodies. Moreover, the protein chip compositions described herein provide a means for introducing specific protein sequences which may include unnatural amino acids or analogs thereof. The availability of solid phase supports with amine groups available for peptide binding facilitates production of the protein chip compositions of the present invention comprising ligated expressed proteins produced by the novel methods described herein.