A superfamily of eukaryotic genes encoding potential nucleic-acid-binding proteins contains zinc-finger (ZF) domains of the Cys.sub.2 -His.sub.2 (C.sub.2 H.sub.2) class. Proteins that have these characteristic structural features play a key role in the regulation of gene expression[1-4]. Sequence comparisons, mutational analyses, and a recent crystallographic investigation have revealed that each finger domain, as a rule, interacts with the major groove of B-form DNA through contacts with some or all three base pairs within a DNA triplet. These base-specific interactions are mediated through amino acid (AA) side chains at specific positions in the a-helical region [5-10] of the protein domain.
Although the AA sequences of more than 1,300 ZF motifs have been identified, the exact DNA-binding sites are known only for a few proteins. The available information on DNA contact regions concerns mainly guanine-cytosine-rich strands [5-9] and fewer adenine-thymine-rich sites [11,12]. On the basis of experimental data, the first proposals for rules relating ZF sequences to preferred DNA-binding sites have been made [13,14]. However, no general rules for ZF protein-DNA recognition have been proposed. This is likely due to the fact that neither computer modeling [2,3,5] nor crystallographic analysis [7] have provided enough information on the overall structural variety in the ZF-DNA contact region.
Using physical atomic-molecular models to characterize the steric conditions in the specific contact positions for different ZF-DNA interactions, an objective of the work leading to the present invention was to determine a set of general rules for ZF-DNA recognition for the C.sub.2 H.sub.2 class of ZF domains. Once this objective had been reached, the work of the invention plan was to develop an algorithm, and a computer system using the algorithm, to design effective zinc-finger DNA-binding polypeptides. The achievement of these goals represents a major advance of knowledge in the field, knowledge characterized by the disclosures of Rebar, et. al. and Beerli, et. al. [15,16]. These two disclosures are concerned with the selection, using the phage display system, of specific zinc fingers with new DNA-binding specificities. On the other hand, the present disclosure is concerned with the design of DNA-binding proteins for any given DNA sequence.