Protein engineering methodologies have proven to be invaluable for generating protein based tools for application in basic research, diagnostics, drug discovery and as protein therapeutics. The ability to manipulate the primary structure of a protein in a controlled manner opens up many new possibilities in the biological and medical sciences. As a consequence, there is a concerted effort on developing methodologies for the site-specific modification of proteins and their subsequent application.
The two main approaches to generating proteins are through recombinant methods or chemical synthesis. To date, the two methods have proved to be complementary; recombinant methodologies enable proteins of any size to be generated but in general they are restricted to the assembly of the proteinogenic amino acids. Thus, in general, the introduction of labels and probes into recombinant proteins has to be implemented post-translationally and does not allow modifications to the protein backbone.
The most common methods for labelling a recombinant protein use an amino or a thiol reactive version of the label that will covalently react with a lysine side chain/Nα amino group or a cysteine side chain within the protein respectively. For such labelling methods to be site-specific, an appropriate derivative of the protein must be engineered to contain a unique reactive functionality at the position to be modified. This requires all the other naturally occurring reactive functionalities within the primary sequence to be removed through amino acid mutagenesis. In the case of protein amino functionalities, this is essentially impossible due to the abundance of lysine residues within proteins and the presence of the amino functionality at the N-terminus of the sequence. Likewise, for cysteine this process is laborious and is often detrimental to the function of the protein.
The production of proteins having site-specific modifications and/or labels is more readily achievable using chemical synthesis methods. The chemical synthesis of proteins enables multiple modifications to be incorporated into both side-chain and backbone moieties of the protein in a site-specific manner, but, in general, the maximum size of sequence that can be synthesised and isolated is circa 50-100 amino acids.
Protein Ligation
A further approach to the generation of proteins is protein/peptide ligation. In this approach mutually reactive chemical functionalities (orthogonal to the chemistry of the naturally occurring amino acids i.e. which react by mutually exclusive chemistries compared to the reactions of the reactive moieties of the naturally occuring amino acids) are incorporated at the N- and C-termini of unprotected polypeptide fragments such that when they are mixed, they react in a chemoselective manner to join the two sequences together (Cotton G J and Muir T W. Chem. Biol., 1999, 6, R247-R254). The principle of chemical ligation is shown schematically in FIG. 1.
A number of chemistries have been utilised for the ligation of two synthetic peptides where a diverse range of different chemical functionalities can be incorporated into the termini of polypeptides using solid phase peptide synthesis. These include the reaction between a thioacid and bromo-alkyl to form a thioester (Schnolzer M and Kent S B H, Science, 1992, 256, 221-225), reaction of an aldehyde with an N-terminal cysteine or threonine to form thiazolidine or oxazolidine respectively (Liu C-F and Tam J P. Proc. Natl. Acad. Sci. USA, 1994, 91, 6584-6588), reaction between a hydrazide and an aldehyde to form a hydrazone (Gaertner H F et al, et al Bioconj. Chem., 1992, 3, 262-268) reaction of an aminoxy group and an aldehyde to form an oxime (Rose K. J. Am. Chem. Soc., 1994, 116, 30-33), reaction of azides and aryl phosphines to form an amide bond (Staudinger ligation) (Nilsson B L, Kiessling L L, and Raines R T. Org. Lett., 2001, 3, 9-12, Kiick et al Proc. Natl. Acad. Sci. USA, 2002, 99, 19-24), and the reaction of a peptide C-terminal thioester and an N-terminal cysteine peptide to form a native amide bond (Dawson et al. Science, 1994, 266, 776) (Native chemical ligation U.S. Pat. No. 6,184,344, EP 0832 096 B1). This native chemical ligation method is an extension of studies by Wieland and coworkers who showed that the reaction of ValSPh and CysOH in aqueous buffer yielded the dipeptide ValCysOH (Wieland T et al,. Liebigs Ann. Chem., 1953, 583, 129-149).
Although the native chemical ligation method has proved popular, it requires an N-terminal cysteine containing peptide for the reaction and thus, if a cysteine is not present at the appropriate position in the protein, a cysteine needs to be introduced at the ligation site. However, the introduction of extra thiol groups into a protein sequence may be detrimental to its structure/function, especially since cysteine has a propensity to form disulfide bonds which may disrupt the folding pathway or compromise the function of the folded protein.
As a consequence of the difficulties and problems associated with known ligation techniques, the ligation of two synthetic fragments generally only enables proteins of circa 100-150 amino acids to be chemically synthesised. Although larger proteins have been synthesised by ligating together more than two fragments, this has proved to be technically difficult (Camarero et al. J. Pept. Res., 1998, 54, 303-316, Canne L E et al, J. Am. Chem. Soc., 1999, 121, 8720-8727).
Protein Semi-Synthesis
Protein ligation technologies that enable both synthetic and recombinantly derived protein fragments to be joined together have been described. This enables large proteins to be constructed from combinations of synthetic and recombinant fragments, allowing proteins to be site-specifically modified with both natural and unnatural entities. By utilising such so-called protein semi-synthesis, many different synthetic moieties can be site-specifically incorporated at multiple different sites within a target protein.
In order to utilise recombinant proteins in ligation strategies the recombinant fragments must contain the appropriate reactive functionalities to facilitate ligation. One approach to introduce a unique reactive functionality into a recombinant protein has been through the periodate oxidation of N-terminal serine containing sequences. Such treatment converts the N-terminal serine into a glyoxyl moiety, which contains an N-terminal aldehyde. Synthetic hydrazide containing peptides have then been ligated to the N-terminus of these proteins in a chemoselective manner through hydrazone bond formation with the protein N-terminal glyoxyl group (Gaertner H F et al, et al Bioconj. Chem., 1992, 3, 262-268, Gaertner H F, et al. J. Biol. Chem., 1994, 269, 7224-7230). Another approach has been to generate recombinant proteins with N-terminal cysteine residues. Synthetic peptides containing C-terminal thioesters have then been site-specifically attached to the N-terminus of these proteins via amide bond formation in a manner analogous to ‘native chemical ligation’ (Cotton G J and Muir T W. Chem. Biol., 2000, 7, 253-261). However as with the ligation of synthetic peptides using native chemical ligation techniques, the technology requires a cysteine to be introduced at the ligation site if the primary sequence does not contain one at the appropriate position.
Protein Splicing Techniques
Recently technologies have been developed which enable recombinant proteins containing C-terminal thioester groups to be generated. The C-terminal thioester functionality provides a unique reactive chemical group within the protein that can be utilised for protein ligation. Recombinant C-terminal thioester proteins are produced by manipulating a naturally occurring biological phenomenon known as protein splicing (Paulus H. Annu Rev Biochem 2000, 69, 447-496). Protein splicing is a post-translational process in which a precursor protein undergoes a series of intramolecular rearrangements which result in precise removal of an internal region, referred to as an intein, and ligation of the two flanking sequences, termed exteins (FIG. 2). While there are generally no sequence requirements in either of the exteins, inteins are characterised by several conserved sequence motifs and well over a hundred members of this protein domain family have now been identified.
The first step in protein splicing involves an N→S (or N→O) acyl shift in which the N-extein unit is transferred to the sidechain SH or OH group of a conserved Cys/Ser/Thr residue, always located at the immediate N-terminus of the intein. Insights into this mechanism have led to the design of a number of mutant inteins which can only promote the first step of protein splicing (Chong et al Gene. 1997, 192, 271-281, (Noren et al., Angew. Chem. Int. Ed. Engl., 2000, 39, 450-466). Proteins expressed as in frame N-terminal fusions to one of these engineered inteins can be cleaved by thiols via an intermolecular transthioesterification reaction, to generate the recombinant protein C-terminal thioester derivative (FIG. 3) (Chong et al Gene. 1997, 192, 271-281, (Noren et al., Angew. Chem. Int. Ed. Engl., 2000, 39, 450-466) (New England Biolabs Impact System WO 00/18881, WO 0047751). Peptide sequences containing an N-terminal cysteine residue can then be specifically ligated to the C-termini of such recombinant C-terminal thioester proteins (Muir et al Proc. Natl. Acad. Sci. USA., 1998, 95, 6705-6710, Evans Jr et al. Prot. Sci., 1998, 7, 2256-2264), in a procedure termed expressed protein ligation (EPL) or intein-mediated protein ligation (IPL).
The chemoselective ligation of N-terminal cysteine containing peptides to C-terminal thioester containing peptides, be they synthetic or recombinant, is performed typically at slightly basic pH and in the presence of a thiol cofactor. The strategy also requires a cysteine to be introduced at the ligation site, if one is not suitably positioned within the primary sequence. These requirements of this ligation approach have the potential to alter the structure and/or function of both the protein ligation product and the initial reactants.
For example, the chemokine RANTES is unstable in a buffer of 100 mM NaCl, 100 mM sodium phosphate pH 7.4 containing 100 mM 2-mercaptoethanesulfonic acid (MESNA); a buffer typically used for the ligation of C-terminal thioester molecules to N-terminal cysteine containing molecules (expressed protein ligation and native chemical ligation). RANTES contains two disulphide bonds critical for maintaining the structure and function of the protein. In the typical ligation buffer described above, the folded protein was found to be converted within 48 hours to a mixture of the reduced protein and MESNA protein adducts. The majority of the protein mixture subsequently formed a precipitate, presumably reflecting the unfolded nature of these species (Cotton, unpublished).
Accordingly, the inventors believe that ligation reactions that require thiol containing buffers are, in general, not suitable for maintaining the integrity of disulphide bond containing proteins, such as antibodies, antibody fragments and antibody domains, cytokines, growth factors etc. Thus there is a requirement for ligation approaches that are typically performed in the absence of thiols. For example, when monitored over a number of days, it was found that RANTES was stable in 100 mM NaCl, 100 mM sodium phosphate buffer pH 7.4 and 100 mM sodium acetate buffer pH 4.5 (inventor's unpublished results). Ligation reactions that can be performed under such conditions should therefore be applicable for both disulphide and non-disulphide containing proteins.
Protein Labelling
Historically protein ligation means the joining together of two peptide/protein fragments but this is synonymous with protein labelling whereby the label is a peptide or derivatised peptide. Equally if a small non-peptidic synthetic molecule contains the necessary reactive chemical functionality for protein ligation, then ligation of the synthetic molecule directly to either the N- or C-termini of the protein affords site-specific labelling of the protein. Thus technologies developed for the ligation of protein fragments can also be used for the direct labelling of either the N- or C-termini of peptides or proteins in a site-specific manner irrespective of their sequence.
Recombinant proteins containing N-terminal glyoxyl functions (generated through periodate oxidation of the corresponding N-terminal serine protein) have been site-specific N-terminally labelled through reaction with hydrazide or aminoxy derivatives of the label (Geoghegan K F and Stroh J G. Bioconj Chem., 1992, 3, 138-146, Alouni S et al. Eur. J. Biochem., 1995, 227, 328-334). Also recombinant proteins containing N-terminal cysteine residues have been N-terminally labelled through reaction with labels containing thioester functionalities, the label being the acyl substituent of the thioester (Schuler B and Pannell L K. Bioconjug. Chem., 2002, 13, 1039-43) and aldehyde functionalities (Zhao et al. Bioconj. Chem., 1999, 10, 424-430) to form amides and thiazolidines respectively.
Though a number of methods for ligation of proteins exist each one has its potential drawbacks. There is thus a need for novel ligation methodologies, especially those that are compatible with both synthetic and recombinant fragments, and which may be used in the ligation of disulphide bond containing proteins as well as non disulphide bond containing proteins, which will complement the existing technologies and add another string to the protein engineer's bow.