Site-specific labeling of biomolecules with fluorophores often requires careful choice of labeling chemistry, optimization of the labeling reaction and characterization of the labeled biomolecules for labeling efficiency, site-specificity, and retention of functionality. The two most commonly used approaches for proteins are based on chemical coupling to sulfhydryl groups or primary amines, which result in different labeling patterns on different proteins as a consequence of the unique content and distributions of cysteine (Cys) and lysine residues in a given protein. The most common method for site-specific labeling of proteins with fluorophores is Cys-specific labeling with thiol-reactive reagents. During this reaction, proteins with surface-exposed Cys residues are covalently modified by maleimide, iodoacetamide, or other reactive conjugates of fluorophores (Waggoner, Methods Enzymol., 246:362 (1995); Haugland, Handbook of Fluorescent Probes and Research Products, 8th ed., 2002; Selvin, Methods Enzymol., 246:300 (1995)). This is a method of choice for small proteins (<about 200 residues) because cysteine is a rare amino acid and can be substituted easily with other amino acids using site-directed mutagenesis (Kunkel et al., Methods Enzymol., 205:125 (1991)).
If a protein of interest has no Cys residues, the site of incorporation of the label is selected after inspection of a high-resolution three-dimensional structure (generated using x-ray crystallography or nuclear magnetic resonance). Labeling should not perturb the enzymatic activity or the spatial arrangement of the protein sequence (also known as the “protein fold”). Subsequently, an existing amino acid (preferably having a side chain of charge, size, and hydrophobicity similar to that of Cys) at the site of choice is substituted by a Cys using site-directed mutagenesis (Kunkel et al., 1991).
If an unmodified protein has a single preexisting Cys, structural information, along with measurements of the surface accessibility of Cys side chain (Kapanidis et al., J. Mol. Biol., 312:453 (2001)), may determine whether the existing Cys can be used for labeling; otherwise, the preexisting Cys can be converted to the structurally similar amino acid serine, and the procedure for Cys-free proteins can be followed.
In a recently developed approach referred to as expressed protein ligation (EPL) (Muir, Annu. Rev. Biochem., 72:249 (2003)), proteins are expressed in C-terminal fusion with an intein domain and an affinity tag. The resulting fusion proteins can be separated from the proteins of the expression host on an affinity matrix. Treatment of the immobilized protein with a high concentration of thiol leads to the cleavage of the peptide bond between intein and target protein. The cleaved protein carries a thioester group on the C-terminus that can be coupled to a peptide (or, in fact, any molecule) bearing a Cys at its N-terminus by native chemical ligation to generate a native peptide bond at the coupling site (Dawson et al., Science, 266:776 (1994)). Although this approach was successfully used for protein engineering, its shortcomings are related to the necessity of expressing a large fusion protein that may influence the solubility and folding of the target protein and the different efficiencies of intein splicing due to the influence of the flanking residues of the target protein (Zhang et al., Gene, 275:241 (2001)).
An alternative strategy is where a thioester-conjugated functionality such as a fluorophore is coupled onto the N-terminal Cys of a recombinant protein (Schuler et al., Bioconjugate Chem., 13:1039 (2002)). In some cases, Cys on position 2 becomes N-terminal upon methionine cleavage by aminopeptidase of the expression host, although the efficiency varies among proteins (Gentle et al., Bioconjugate Chem., 15:658 (2004)). In another strategy, an N-terminal Cys is created by self-cleavage of an intein domain fused N-terminally to the target protein. Alternatively, the N-terminal Cys residue can be generated by proteolytic cleavage of a properly engineered protease cleavage site. Among proteases that were shown to tolerate Cys at the +1 position of the cleavage site are factor X, Precision protease, and TEV proteases (Cotton et al., Chem. Biol., 7:253 (2000); Tolbert et al., Angew. Chem. Int. Ed., 41:2171 (2002)).
Accordingly, there is a need for compounds, compositions, and methods to aid in site-specifically linking chemical groups onto specific sites of proteins or peptides.