The site-specific incorporation of bio-orthogonal groups via genetic code expansion provides a powerful general strategy for site specifically labeling proteins with any probe. However, the slow reactivity of the bio-orthogonal functional groups that can be genetically encoded has limited this strategy's utility.
There is a pressing need for general methods to site-specifically label proteins, in diverse contexts, with user-defined probes.
Current protein labeling methods involve the use of fluorescent protein fusions, 1-4 self-labeling proteins (e.g., SNAPtag, HALOtag, CLIPtag),[5-8] ligases (e.g., biotin ligase, lipolic acid ligase, sortase, and phosphopantetheinyl-transferase)[9-15] and self-labeling tags (e.g., tetracysteine and tetraserine) [16,17] While some of these approaches allow rapid labeling and have had substantial impact on biological studies, they require the use of protein fusions and/or the introduction of additional sequences into the protein of interest. This can disturb the structure and function of the protein and make it challenging to place probes at any position in a protein.
Moreover, the range of probes that can be incorporated by some of these methods is limited.[3,4,18].
Ideal methods for protein labeling would i) allow probes to be easily placed at any position in a protein in diverse cells, ii) be rapid and quantitative, iii) be specific for a user-defined site in a protein, iv) show .‘turn on.’ fluorescence, with minimal off-site or background labeling, and v) allow for labeling with diverse probes. In principle, the genetically encoded, site specific incorporation of unnatural amino acids bearing bioorthogonal functional groups would allow the labeling of specific proteins at defined sites with essentially any probe.
Bio-orthogonal groups, including azides, alkynes, ketones, anilines, alkenes, tetrazoles, and [1,2] aminothiols have been genetically encoded using amber suppressor aminoacyl tRNA synthetase/tRNACUA pairs.[19-29] For established reactions that have been demonstrated on proteins the rate constants for the corresponding model reactions[30] are in the range of 10−2 M−1 s−1 to 10−4 M−1 s−1 (although for emerging approaches higher rates have been reported).[29,31,32]
The rates of established reactions are clearly sufficient to allow useful labeling of metabolically incorporated azido- and keto-bearing glycan analogs presented at high density on the cell surface, and the labeling of amino acid analogs incorporated throughout the proteome.[33-35] However the sluggishness of established bio-orthogonal reactions often makes it challenging to quantitatively label proteins at defined sites in vitro, and may account for the fact that there are currently no examples of labeling proteins expressed on the mammalian cell surface using genetically encoded unnatural amino acids.
The present invention seeks to overcome problem(s) associated with the prior art.