Chemical methods for site-specific functionalization of proteins and peptides are useful in a variety of research and biomedical applications. For example, the site-specific attachment of a chromophore such as a fluorescent dye to a target protein can be useful to enable detection of such protein in a complex mixture or to track expression and localization of the target protein within a cell or living organism. On the other hand, site-specific functionalization of a protein with an affinity tag can be used to facilitate protein isolation, purification, and characterization. Site-specific functionalization can also be useful in the preparation of protein microarrays, which in turn can be useful for screening protein-ligand, protein-protein, antigen-antibody interactions. As another example, methods to chemically link a protein such as a therapeutic protein to a polymer (e.g., polyethylene glycol), a small-molecule drug, a cell receptor ligand, or another protein or peptide can be valuable to enhance and modulate the pharmacological, pharmacokinetic, or tissue-targeting properties of the therapeutic protein.
Several methods for the functionalization of peptides and proteins are known in the art (see, e.g., Hermanson 1996; Jing and Cornish 2011; Crivat and Taraska 2012). Conventional strategies have taken advantage of nucleophilic side-chain functionalities in certain amino acids (e.g., thiol group in cysteine, amino group in lysine) to couple a chemical species to the polypeptide via an electrophilic reagent (Hermanson 1996). An inherent limitation of these approaches is than more than one such amino acid can be present in the target polypeptide, preventing accurate control on the site-selectivity of the reaction. Furthermore, using these strategies, selective labeling of an individual protein in complex biological mixtures (e.g., cell lysate or within a cell) is not possible owing to the occurrence of numerous other proteins having similar reactive functionalities.
More recent approaches for protein labeling have involved the genetic fusion of a protein to a protein tag such as a fluorescent protein (e.g., green fluorescent protein and variants thereof) or an enzyme, which can be covalently modified via an irreversible inhibitor to indirectly link a certain chemical species (e.g., fluorophore or affinity label) to the protein of interest (Jing and Cornish 2011; Crivat and Taraska 2012). Examples of the latter include the so-called SNAP tag (Keppler, Gendreizig et al. 2003), HaloTag (Los, Encell et al. 2008), and the TMP-tag (Calloway, Choob et al. 2007). A common drawback of these approaches is however that permanent fusion of the target protein to a non-native protein tag may affect the biological function, dynamics, conformational properties, and/or cellular localization of the protein of interest.
Other approaches in the area of protein labeling have involved the use of short (e.g., 6-20 amino acid-long) peptide sequences which are genetically fused to the protein of interest and serve as recognition sites for enzyme-catalyzed posttranslational modifications. By action of these enzymes or engineered variants thereof and utilizing modified co-substrates, fluorophores or other small molecule labels have been attached to these peptide sequences, and thus, to the target protein. Examples of these strategies include the use of biotin ligase BirA (Chen, Howarth et al. 2005), sortase (Popp, Antos et al. 2007), lipoic acid ligase (Cohen, Zou et al. 2012), and phosphopantetheine transferase (PPTase) (Yin, Liu et al. 2004). Also in this case, however, the target protein must be permanently fused to a non-native peptide sequence, which can alter the properties of the former. In addition, the addition (or co-expression) of an auxiliary processing enzyme is required for both in vitro and in vivo applications.
In general, ‘traceless’ methods for protein labeling that involve no modifications or extensions of the primary sequence of the target protein are highly desirable in order to minimize the risks of altering its structure/function/cellular localization. In particular, the ability to site-specifically attach new chemical entities to the carboxy-terminus of a protein or enzyme is most valuable as the C-terminus is often solvent-exposed and typically not directly involved in binding or catalysis. Thus, efficient methods for C-terminal functionalization of a protein can be of great value toward protein labeling or immobilization under non-disruptive conditions.
Recently developed technologies have made possible the generation of recombinant proteins comprising a thioester group at their C-terminal end. The C-terminal thioester group provides a unique reactive chemical functionality within the protein which can be exploited for site-specific labeling of a target protein. Recombinant C-terminal thioester proteins can be generated by exploiting the mechanism of inteins, which are naturally occurring proteins capable of excising themselves from the internal region of a precursor polypeptide via a posttranslational process known as protein splicing (Paulus 2000). The first step in protein splicing involves an intein-catalyzed N→S (or N→O) acyl transfer in which the polypeptide chain flanking the intein N-terminus (N-extein) is transferred to the side-chain thiol or hydroxy group of a conserved cysteine, serine, or threonine residue at the N-terminus of the intein. Further intramolecular rearrangements follow that ultimately lead to the excision of the intein from the precursor polypeptide and the ligation of N-extein unit to the C-extein unit (=polypeptide chain flanking the intein C-terminus) via a peptide bond. By genetically fusing a protein of interest to the N-terminus of engineered intein variants which are unable to undergo C-terminal splicing (e.g., via mutation of the conserved asparagine residue at the intein C-terminus or removal of the C-extein unit), it is possible to promote only the first step of protein splicing, thereby producing a recombinant protein with a reactive C-terminal thioester linkage. The sequencing and characterization of several naturally occurring intein-comprising proteins show that inteins share a similar mechanism as well as a number of conserved primary sequence regions called ‘intein motifs’, whereas generally there are no specific sequence requirements for the N- and C-extein units. To date, more than 500 experimentally validated and putative intein sequences have been identified.
The ability to generate recombinant C-terminal thioester proteins via the genetic fusion of a protein to the N-terminus of a natural intein, or engineered (or synthetic or artificial) variant thereof, provides the opportunity to link a chemical entity to the protein C-terminus via nucleophilic substitution at the thioester group. A known methodology in this area involves the reaction between a recombinant C-terminal thioester protein with another polypeptide (i.e., a recombinant or synthetic peptideprotein) comprising an N-terminal cysteine. This procedure, also known as Expressed Protein Ligation (Muir, Sondhi et al. 1998), involves an intermolecular transthioesterification reaction followed by an intramolecular S→N acyl shift to give a native peptide bond between the two polypeptide chains. Similarly, cysteine-comprising reagents have been used for labeling/immobilization of recombinant C-terminal thioester proteins (Chattopadhaya, Abu Bakar et al. 2009). Alternatively, and also in the context of protein labeling/immobilization applications, recombinant C-terminal thioester proteins have been functionalized at the C-terminus via the use of hydrazine-, hydrazide-, or oxyamine-comprising chemical reagents, in which the hydrazine, hydrazide, or oxyamine group acts as the nucleophile to promote the C-terminal ligation of the protein of interest to a given chemical species (e.g., a fluorescent dye) (Cotton, U.S. Pat. No. 7,622,552; Raines et al. U.S. Pat. Appl. 20080020942).
Unfortunately, all the aforementioned methods for protein C-terminal labeling are characterized by slow reaction kinetics resulting in low labeling efficiencies, in particular at short reaction times. In addition, high concentrations of reagents (either the target C-terminal thioester protein, or the labeling reagent, or both) are typically required to achieve satisfactory yields of the desired protein functionalized product. Furthermore, thiol catalysts such as, for example, thiophenol, mercaptoethanol, or MESNA, are typically necessary to expedite and/or increase the yields of these protein functionalization procedures. As a result of these drawbacks, the utility of these methods for protein C-terminal labeling/immobilization remains limited. For example, these reactions conditions can be hardly attained at the intracellular level, severely limiting the scope of these methods in the context of in vivo protein labeling applications. Furthermore, fast protein labeling procedures are required to enable the detection and isolation of transient or short-lived protein species in the context of proteomic or cell biology studies. Finally, the limited stability of certain proteins may not be compatible with the need for high reagent or catalyst concentrations associated to these methods.
Citation or identification of any reference in Section 2, or in any other section of this application, shall not be considered an admission that such reference is available as prior art to the present invention.