Sequence-specific binding of proteins to DNA, RNA, protein and other molecules is involved in a number of cellular processes such as, for example, transcription, replication, chromatin structure, recombination, DNA repair, RNA processing and translation. The binding specificity of cellular binding proteins that participate in protein-DNA, protein-RNA and protein-protein interactions contributes to development, differentiation and homeostasis. Alterations in specific protein interactions can be involved in various types of pathologies such as, for example, cancer, cardiovascular disease and infection.
Increased understanding of the nature and mechanism of protein binding specificity has encouraged the hope that specificity of a binding protein could be altered in a predictable fashion, or that a binding protein of predetermined specificity could be constructed de novo. See, for example, Blackburn (2000) Curr. Opin. Struct. Biol. 10:399-400; Segal et al. (2000) Curr. Opin. Chem. Biol. 4:34-39. To date, the greatest progress in both of these areas has been obtained with a class of binding proteins known as zinc finger proteins.
Zinc finger proteins (ZFPs) are proteins that can bind to DNA in a sequence-specific manner. Zinc fingers were first identified in the transcription factor TFIIIA from the oocytes of the African clawed toad, Xenopus laevis. An exemplary motif characterizing one class of these proteins (C2H2 class) is -Cys-(X)2-4-Cys-(X)12-His-(X)3-5-His (SEQ ID NO: 1), where X is any amino acid. A single zinc finger domain is about 30 amino acids in length, and several structural studies have demonstrated that it contains a beta turn (containing the two invariant cysteine residues) and an alpha helix (containing the two invariant histidine residues), which are held in a particular conformation through coordination of a zinc atom by the two cysteines and the two histidines. To date, over 10,000 zinc finger sequences have been identified in several thousand known or putative transcription factors. Zinc finger domains are involved not only in DNA recognition, but also in RNA binding and in protein-protein binding. Current estimates are that this class of molecules will constitute about 2% of all human genes.
The x-ray crystal structure of Zif268, a three-finger domain from a murine transcription factor, has been solved in complex with a cognate DNA sequence. Pavletich et al. (1991) Science 252:809-817. The structure suggests that each finger interacts independently with a 3-nucleotide DNA subsite, with side-chains at positions xe2x88x921, +2, +3 and +6 (with respect to the start of the xcex1-helix) making contacts with bases in a DNA triplet subsite. The amino terminus of Zif268 is situated at the 3xe2x80x2 end of the DNA strand with which it makes most contacts. Some zinc fingers can bind to a fourth base in a target segment. If the strand with which a zinc finger protein makes most contacts is designated the target strand, some zinc finger proteins bind to a three base triplet in the target strand and a fourth base on the non-target strand. The fourth base is complementary to the base immediately 3xe2x80x2 of the three base subsite. See Wolfe et al. (2000) Annu. Rev. Biophys. Biomol. Struct. 3:183-212 for a recent review on DNA recognition by zinc finger proteins.
The structure of the Zif268-DNA complex also suggested that the DNA sequence specificity of a zinc finger protein could be altered by making amino acid substitutions at the four positions (xe2x88x921, +2, +3 and +6) involved in DNA base recognition. Phage display experiments using zinc finger combinatorial libraries to test this observation were published in a series of papers in 1994. Rebar et al. (1994) Science 263:671-673; Jamieson et al. (1994) Biochemistry 33:5689-5695; Choo et al. (1994) Proc. Natl. Acad. Sci. USA 91:11163-11167 (1994). Combinatorial libraries were constructed with randomized amino acid residues in either the first or middle finger of Zif268, and members of the library able to bind to an altered Zif268 binding site (in which the appropriate DNA sub-site was replaced by an altered DNA triplet) were selected. The amino acid sequences of the selected fingers were correlated with the nucleotide sequences of the new binding sites for which they had been selected. In additional experiments, correlations were observed between the nature of mutations introduced into a recognition helix and resulting alterations in binding specificity. The results of these experiments have led to a number of proposed substitution rules for design of ZFPs with altered binding specificity. Most of these substitution rules concern amino acids occupying positions xe2x88x921, +2, +3 and +6 in the recognition helix of a zinc finger protein, which have been reported to be the principal determinants of binding specificity. Some of these rules are supported by site-directed mutagenesis of the three-finger domain of the transcription factor, Sp-1. Desjarlais et al. (1992a) Proc. Natl. Acad. Sci. USA 89:7345-7349; Desjarlais et al. (1992b) Proteins: Structure, Function and Genetics 12:101-104; Desjarlais et al. (1993) Proc. Natl. Acad. Sci. USA 90:2256-2260.
Two general classes of design rules for zinc finger proteins have been proposed. The first relates one or more amino acids at a particular position in the recognition helix with a nucleotide at a particular position in the target subsite. For example, if the 5xe2x80x2-most nucleotide in a three-nucleotide target subsite is G, certain design rules specify that the amino acid at position +6 of the recognition helix is arginine, and optionally position +2 of an adjacent carboxy-terminal finger is aspartic acid. The second class of design rules relates the sequence of an entire recognition helix with the sequence of a three- or four-nucleotide target subsite. These and related design rules have been elaborated in, for example, U.S. Pat. No. 6,140,081; PCT WO98/53057; PCT WO98/53058; PCT WO98/53059; PCT WO98/53060; PCT WO00/23464; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411-416; Segal et al. (2000) Curr. Opin. Chem. Biol. 4:34-39; and references cited in these publications.
In addition, two strategies for identifying a zinc finger which binds to a specific triplet subsite have emerged. In the first strategy, the sequence of a portion (generally a single finger but, in some cases, one-and-a-half fingers) of a multi-finger protein is randomized (generally at positions xe2x88x921, +2, +3 and +6 of the recognition helix), and members of the randomized population able to bind to a particular subsite are selected. The second strategy relies on de novo synthesis of a zinc finger specific for a particular subsite, using existing design rules as set forth supra. See, for example, Choo et al. (1997) Curr. Opin. Struct. Biol. 7:117-125; Greisman et al. (1997) Science 275:657-661.
In attempting to construct a ZFP of predetermined specificity able potentially to discriminate a target sequence in a eucaryotic genome, it is necessary to join individual zinc fingers into a multi-finger protein. However, because of overlap in the recognition of adjacent subsites in a target sequence by adjacent zinc fingers in a ZFP, cooperativity and synergistic interactions between adjacent fingers, currently existing design and selection methods have been limited largely to zinc fingers which recognize G-rich target subsites; in particular triplets of the form GNN and, to a lesser extent, TNN. Although certain selection methods not limited to GNN triplets have been devised, they involve construction of multiple libraries; hence they are more difficult to practice and the degree of possible randomization is limited.
Another deficiency of current design rules is that they do not provide zinc finger sequences able to recognize every one of the 64 possible triplet subsites. Moreover, even for those subsites that are covered, the design rules are degenerate, in that they often specify more than one amino acid for recognition of a particular nucleotide at a particular position in a target subsite, with no direction provided for choosing the best possible amino acid from among the alternatives offered. See, for example, Isalan et al. (1998) Biochemistry 37:12026-12033; Wolfe et al. (1999) J. Mol. Biol. 285:1917-1934; Elrod-Erickson et al. (1998) Structure 6:451-464; Choo and Isalan (2000) Curr. Opin. Struct. Biol. 10:411-416. In fact, recent studies have shown that ZFPs whose synthesis was based on rational design were able to discriminate only 5 of 9 (in one case) or 7 of 9 (in another case) nucleotides in their target sequences. Corbi et al. (1997) FEBS Letts. 417:71-74; Corbi et al. (1998) Biochem. Biophys. Res. Comm. 253:686-692.
Additional reasons for the inability of selection and rational design to enable recognition of any possible target sequence by a ZFP include the following. (1) Selection by phage display often yields ZFPs with high affinity but low specificity; i.e., ZFPs that bind tightly to their target sequence, but also bind tightly to related (or even unrelated) sequences. Thus, methods are required which provide ZFPs which not only bind tightly to their target sequence, but also bind weakly to all other sequences, even those which differ from the target sequence by only a single nucleotide. (2) Existing design rules rely solely on amino acid-base interactions; they do not take into account interactions of amino acids in a ZFP with DNA phosphate residues, nor do they account for concerted interactions between different amino acids in a zinc finger. (3) Framework effects (i.e., effects on binding specificity of amino acids other that those located at xe2x88x921, +2, +3 and +6) are not accommodated by rational design rules. (4) Most design rules fail to take account of context effects; i.e., the fact that a recognition helix may recognize different subsite sequences depending on its location in a multi-finger protein.
Thus, although existing selection methods and design rules provide limited guidance for constructing a zinc finger DNA-binding domain that is potentially capable of recognizing a particular target sequence, it is unlikely that a complete directory, providing one-to-one correspondence between amino acids in the recognition helix and nucleotide bases in the target subsite, will be obtained. See also Pabo et al. (2000) J. Mol. Biol. 301:597-624.
As a result of the limitations accompanying current selection methods and design rules, the probability of being able to generate a protein which will bind specifically and preferentially to a particular target sequence (either nucleotide or amino acid) remains low. Reliable methods for obtaining binding proteins of predetermined specificity would thus represent a significant advance in the art.
Disclosed herein are methods for obtaining binding proteins having a high specificity of binding to a particular target site and a low specificity of binding to non-target sties. In preferred embodiments, the binding protein is a zinc finger protein. In a more preferred embodiment, a zinc finger protein binds to a DNA sequence. In alternative embodiments, a zinc finger protein binds to an RNA sequence or a peptide sequence.
In one aspect, a method of enhancing the binding specificity of a binding protein is provided. The method comprises (a) providing a binding protein designed to bind to a target sequence; (b) determining the specificity of binding of the binding protein to each residue in the target sequence; (c) identifying one or more residues in the target sequence for which the binding protein does not possess the requisite specificity; (d) substituting one or more amino acids at positions in the binding protein that affect the specificity of the binding protein for the residues identified in (c), to make a modified binding protein; (e) determining the specificity of binding of the modified binding protein to each residue in the target sequence; (f) identifying any residues for which the modified binding protein does not possess the requisite specificity; and (g) repeating steps (d), (e) and (f) until the modified binding protein evaluated in step (f) demonstrates the requisite specificity for each residue in the target sequence, thereby obtaining a binding protein with enhanced binding specificity for its target sequence.
In any of the methods or compositions described herein, the target sequence can be, for example, a nucleic acid sequence or an amino acid sequence. The binding protein can be, for example, a DNA-binding protein, such as a zinc finger protein, or an RNA-binding protein. In certain embodiments, the zinc finger protein comprises three zinc fingers, each of which binds a triplet or quartet subsite in the target sequence. In other embodiments, a three-fingered ZFP binding protein is used, wherein at least one finger in the zinc finger protein in step (a) is designed according to a correspondence regime between the identity of bases occupying designated positions in a subsite of the target sequence, and the identity of amino acids occupying designated positions in a zinc finger binding to that subsite. Each of the three fingers can be designed according to a correspondence regime between the identity of bases occupying designated positions in a subsite of the intended target site, and the identity of amino acids occupying designated positions in a zinc finer binding to that subsite. In yet other embodiments, the correspondence regime specifies alternative amino acids for one or more positions in a zinc finger which recognize a target sequence and, additionally, the zinc finger protein in step (a) includes at least one amino acid arbitrarily selected from alternative amino acids specified by the correspondence regime.
In yet other embodiments where the binding protein is a ZFP, the ZFP in step (a) is designed by analysis of a database of existing zinc finger proteins and their respective target sequences. In any of the methods described herein, the substituting of step (d) comprises replacing one or more amino acids with alternative amino acids specified by the correspondence regime, for example, replacing an amino acid at a position of a zinc finger that does not possess the requisite specificity for a base with a consensus amino acid at a corresponding position from a collection of zinc fingers that bind to a subsite of the intended target site.
In yet other embodiments, the site specificity of each nucleotide in the target sequence is determined by contacting the binding protein (e.g., zinc finger protein) with a population of randomized oligonucleotides, selecting oligonucleotides that bind to the zinc finger protein, determining the sequence of the selected oligonucleotides, and determining the percentage of bases occupying each position in the selected oligonucleotides. In certain embodiments, a zinc finger protein does not possess the requisite specificity for a nucleotide at a position if fewer than 80% of selected oligonucleotides contain the nucleotide at the position. In yet other embodiments, a zinc finger does not possess the requisite specificity for the 3xe2x80x2 base of a subsite, and an amino acid at position xe2x88x921 of the recognition helix is substituted. In other embodiments, a zinc finger does not possess the requisite specificity for the mid base of a subsite and an amino acid at position +3 of the recognition helix is substituted. In other embodiments, a zinc finger does not possess the requisite specificity for the 5xe2x80x2 base of a subsite and an amino acid at position +6 of the recognition helix is substituted. In still other embodiments, a zinc finger does not possess the requisite specificity for the 5xe2x80x2 base of a subsite and an amino acid at position +2 of an adjacent C-terminal zinc finger is substituted. In any of the methods described herein, one or more amino acid(s) is(are) substituted in step (c) and in certain embodiments, steps (c) and (d) are repeated at least twice.
In another aspect, a method for identifying a secondary target site for a binding protein, wherein the binding protein is designed to bind a target sequence is provided. The method comprises: (a) determining the specificity of the binding protein for each residue in the target sequence, thereby identifying one or more secondary target sites bound by the binding protein; and (b) comparing the sequence of the secondary target site with a database of naturally-occurring sequences to identify at least one naturally-occurring sequence comprising the secondary target site. In certain embodiments, the naturally-occurring sequences form all or a portion of the sequence of a genome (e.g., a human genome). The target sequence can be, for example, a nucleotide sequence or an amino acid sequence. Additionally, in certain embodiments, the binding protein is a zinc finger protein and step (a) comprises contacting the zinc finger protein with a population of randomized oligonucleotides to identify a subpopulation of oligonucleotides that bind to the zinc finger protein; one or more of these oligonucleotides or a consensus sequence of these oligonucleotides constituting the one or more secondary target sites.
In another aspect, a method of comparing zinc finger proteins that bind to target sequences within a target gene is provided. In certain embodiments, the method comprises (a) determining the binding profile of a first zinc finger protein, designed to bind a first target sequence within the gene, for each base in the first target sequence; (b) determining the binding profile of a second zinc finger protein, designed to bind a second target sequence within the gene, for each base in the second target sequence; and (c) comparing the profiles of the first and second zinc finger proteins as an indicator of relative specificity of binding. In certain embodiments, the first and second target sequences are the same and the method allows for selection of a ZFP which binds with higher specificity to that sequence. In certain embodiments, the binding profile of the first zinc finger protein to the first target sequence is determined by contacting the first zinc finger protein with a population of randomized oligonucleotides to identify a subpopulation of oligonucleotides that bind to the first zinc finger protein, the identity of random segments in the subpopulation providing a profile of the specificity of binding of the first zinc finger protein; and (b) the binding profile of the second zinc finger protein to the second target sequence is determined by contacting the second zinc finger protein with a population of randomized oligonucleotides to identify a subpopulation of oligonucleotides that bind to the second zinc finger protein, the identity of random segments in the subpopulation providing a profile of the specificity of binding of the second zinc finger protein.
In yet another aspect, a method of modulating expression of a gene is provided. In certain embodiments, the method comprises contacting the gene with a zinc finger protein identified by any of the methods described herein, wherein the ZFP has the requisite binding specificity.
In still further embodiments, compositions comprising zinc finger proteins identified by any of the methods described herein and a pharmaceutical excipient are provided.
These and other embodiments will readily occur to those of skill in the art in light of the disclosure herein.