This invention relates to compositions and methods for labeling molecules, particularly small, synthetic molecules that can specifically react with target sequences.
Many techniques in the biological sciences require attachment of labels to molecules, such as polypeptides. For example, the location of a polypeptide within a cell can be determined by attaching a fluorescent label to the polypeptide.
Traditionally, labeling has been accomplished by chemical modification of purified polypeptides. For example, the normal procedures for fluorescent labeling require that the polypeptide be covalently reacted in vitro with a fluorescent dye, then repurified to remove excess dye and/or any damaged polypeptide. Using this approach, problems of labeling stoichiometry and disruption of biological activity are often encountered. Furthermore, to study a chemically modified polypeptide within a cell, microinjection can be required. This can be tedious and cannot be performed on a large population of cells.
Thiol- and amine-reactive chemical labels exist and can be used to label polypeptides within a living cell. However, these chemical labels are promiscuous. Such labels cannot specifically react with a particular cysteine or lysine of a particular polypeptide within a living cell that has numerous other reactive thiol and amine groups.
A more recent method of intracellular labelling of polypeptides in living cells has involved genetically engineering fusion polypeptides that include green fluorescent protein (GFP) and a polypeptide of interest. However, GFP is limited in versatility because it cannot reversibly label the polypeptide. The ability to generate a wide range of specifically labeled molecules easily and reliably would be particularly useful.
In a first aspect, the invention features a biarsenical molecule of the following formula: 
and tautomers, anhydrides, and salts thereof;
wherein:
each X1 or X2, independently, is Cl, Br, I, ORa, or SRa, or
X1 and X2 together with the arsenic atom form a ring having the formula 
Ra is H, C1-C4 alkyl, CH2CH2OH, CH2COOH, or CN;
Z is 1,2-ethanediyl, 1,2-propanediyl, 2,3-butanediyl, 1,3-propanediyl, 1,2 benzenediyl, 4-methyl-1,2-benzenediyl, 1,2-cyclopentanediyl, 1,2-cyclohexanediyl, 3-hydroxy-1,2-propanediyl, 3-sulfo-1,2-propanediyl, or 1,2-bis(carboxy)-1,2-ethanediyl;
Y1 and Y2, independently, are H or CH3; or
Y1 and Y2, together form a ring such that the biarsenical molecule has the formula 
where M is O, S, CH2, C(CH3)2, or NH;
R1 and R2, independently, are ORa, OAc, NRaRb, or H;
R3 and R4, independently, are H, F, Cl, Br, I, ORa, or Ra; or
R1 together with R3, or R2 together with R4, or both, form a ring in which
(i). one of R1 or R3 is C2-C3 alkyl and the other is NRa and
(ii). one of R2 and R4 is C2-C3 alkyl and the other is NRa;
Rb is H, C1-C4 alkyl CH2CH2OH, CH2COOH, or CN;
Q is CRaRb, CRaORb, Cxe2x95x90O, or a spirolactone having the formula: 
wherein the spiro linkage is formed at C1.
Particularly preferred is a biarsenical molecule where X1 and X2 together with the arsenic atom form a ring having the formula 
Also preferred is a biarsenical where X1 and X2 together with the arsenic atom form a ring having the formula 
In another preferred embodiment of the biarsenical molecule, Q is chosen from the following spirolactones: 
A more preferred embodiment is a biarsenical where Q is 
A particularly preferred biarsenical molecule has the following formula: 
The tautomers, anhydrides and salts of the biarsenical molecule of formula (III) are also included.
Preferably, the biarsenical molecule specifically reacts with a target sequence to generate a detectable signal, for example, a fluorescent signal.
The biarsenical molecule preferably is capable of traversing a biological membrane. The biarsenical molecule preferable includes a detectable group, for example a fluorescent group, luminescent group, phosphorescent group, spin label, photosensitizer, photocleavable moiety, chelating center, heavy atom, radioactive isotope, isotope detectable by nuclear magnetic resonance, paramagnetic atom, and combinations thereof.
For some applications, the biarsenical molecule can be immobilized on a solid phase, preferably by covalent coupling.
In another aspect, the invention features a kit. The kit includes the above-described biarsenical molecule and a bonding partner that includes a target sequence. The target sequence includes one or more cysteines and is capable of specifically reacting with the biarsenical molecule. Preferably, the target sequence includes four cysteines. The target sequence preferably is a cys-cys-X-Y-cys-cys (SEQ ID NO:5) xcex1-helical domain, where X and Y are amino acids. Preferably, X and Y are amino acids with high xcex1-helical propensity. In some embodiments, X and Y are the same amino acids. In other embodiments, X and Y are different amino acids. In particularly preferred embodiments, the target sequence is SEQ ID NO. 1 or SEQ ID NO. 4.
The bonding partner can include a carrier molecule, for example a carrier polypeptide. In some embodiments, the target sequence is heterologous to the carrier polypeptide. In one preferred embodiment, the target sequence specified by SEQ ID NO. 4 is linked by a peptide bond to the carboxy terminal Lys-238 in the cyan mutant of the green fluorescent protein.
In yet another aspect, the invention features a kit that includes the above-described biarsenical molecule and a vector that includes a nucleic acid sequence encoding a target sequence. The target sequence includes one or more cysteines and is capable of specifically reacting with the biarsenical molecule. Preferably, the target sequence includes four cysteines.
In some preferred embodiments, the vector in the kit includes a nucleic acid sequence encoding a carrier polypeptide and a nucleic acid sequence encoding a target sequence. In some embodiments, the carrier polypeptide is heterologous to the target sequence.
In another aspect, the invention features a complex. The complex includes the above-described biarsenical molecule and a target sequence. In some preferred embodiments, the target sequence is SEQ ID NO. 1 or SEQ ID NO. 4. Preferably, the biarsenical molecule is biarsenical molecule of formula (III).
In another aspect, the invention features a tetraarsenical molecule. The tetraarsenical molecule includes two biarsenical molecules of the above-described formula. The two biarsenical molecules are coupled to each other through a linking group. In some embodiments, the tetraarsenical molecules have formula VI, VII, or VIII.
xe2x80x9cBonding partnerxe2x80x9d as used herein refers to a molecule that contains at least the target sequence.
xe2x80x9cHeterologousxe2x80x9d as used herein refers to two molecules that are not naturally associated with each other.
xe2x80x9cAssociatedxe2x80x9d as used herein includes association by covalent, as well as by non-covalent interactions.
The invention provides biarsenical molecules that can be engineered to exhibit a variety of properties. For example, the biarsenical molecule can be fluorescent. It can have different wavelengths of excitation and emission, e.g., visible or infrared. The biarsenical molecule specifically reacts with the cysteine-containing target sequence. In addition, the relatively small size of both the biarsenical molecule and the target sequence is particularly advantageous.
Other features and advantages of the invention will be apparent from the following detailed description and from the claims.
SEQ ID No. 1: acetyl-Trp-Glu-Ala-Ala-Ala-Arg-Glu-Ala-Cys-Cys-Arg-Glu-Cys-Cys-Ala-Arg-Ala-amide
Comments: The N-terminus is acetylated and the C-terminus is amidated.
SEQ ID No. 2: 5xe2x80x2-CGG CAA TTC TTA GGC CCT GGC GCA GCA CTC CCT GCA GCA GGC CTC CCT GGC GGC GGC CTC GGC CTT GTA CAG CTC GTC CAT GCC C-3xe2x80x2
SEQ ID No. 3: 5xe2x80x2-CGC GGA TCC GCC ACC ATG CAT GAC CAA CTG ACA TGC TGC CAG ATT TGC TGC TTC AAA GAA GCC TTC TCA TTA TTC-3xe2x80x2.
SEQ ID No. 4: Ala-Glu-Ala-Ala-Ala-Arg-Glu-Ala-Cys-Cys-Arg-Glu-Cys-Cys-Ala-Arg-Ala