1. Field of the Invention
This invention relates to development of novel DNA-binding proteins and polypeptides by an iterative process of mutation, expression, selection, and amplification.
2. Information Disclosure Statement
Proteins that bind sequence-specifically to DNA determine which genetic messages will be expressed and in what quantity. The present application deals only with sequence-specific DNA-binding proteins, abbreviated DBP. Numerous proteins are known that bind to specific DNA sequences and much has been written on the nature of the interactions between DNA and proteins. In a few cases, x-ray crystal structures have been determined for segments of DNA bound to protein (ANDE85, JORD88, AGGA88); Matthews has recently reviewed structures of such complexes (MATT88). Physical methods other than x-ray crystallography are also applied to the study of the interaction of proteins and DNA; Boelens et al. (BOEL85) applied NMR to the E. coli lac repressor and Boelens et al. (BOEL87) and Kaptein et al. (KAPT89) used 2D NMR to study the interaction of lac repressor with the lac operator. The three-dimensional (3D) structures of repressors and repressor-operator complexes can often be reconciled with the chemistry of the molecules (BUSH85). In cases where the x-ray structure is unavailable, chemistry can provide some structural information (BRUN87a). The interpretation of structural and chemical data on the interaction of proteins with DNA remains controversial because such experiments are open to multiple interpretations (BENS89). In only a handful of cases, however, have researchers been able to construct or isolate new proteins that recognize different DNA sequences, and none have demonstrated the ability to prepare a suitable DBP specific for an arbitrary predetermined DNA sequence.
The ability to create novel DNA-binding proteins will have far-reaching applications, including, but not limited to, use in: a) treating viral diseases, b) treating genetic diseases, including cancer, c) preparation of novel biochemical reagents, and d) biotechnology to regulate gene expression in cell cultures.
Viral diseases, in particular retroviral diseases, are difficult to treat. Unlike bacterial pathogens, viruses typically comprise very few genes and subvert the biochemical machinery of the infected cell to the production of virus progeny. Because a typical virus has very few genes that code for a few viral proteins, the list of places at which chemotherapeutic agents can specifically block viral metabolism is quite short. Furthermore, retroviruses, such as HIV-1, show high rates of mutation because their means of reverse transcription is error prone. A chemotherapeutic that blocks a strain of a retrovirus will exert an evolutionary pressure on the population of viruses within a patient to evolve a novel variant of the macromolecular target of the therapeutic agent such that a new strain, insensitive to the therapeutic agent, is likely to arise. Such reduction in efficacy for AZT in the treatment of AIDS has been observed (LARD89a, LARD89b, HIRS90, ROOK89). The present invention allows the development of DNA-binding proteins that bind specifically to viral DNA sequences so that transcription of the viral message is blocked or greatly reduced; see Ruscetti and colleagues (MIKO90). In one particularly preferred embodiment, transcription-blocking proteins are developed to several adjacent viral sequences (each of about 17-20 base pairs) so that the virus would need to mutate in each of the binding regions before it escapes from inhibition of transcription. Preferably, the viral sequences picked as targets are either from regions that regulate viral transcription or that correspond to functional portions of viral proteins.
Transcription-blocking proteins developed by means of the present invention can be introduced into cells (e.g. through liposomes) or the genes encoding these proteins can be introduced into the cells (e.g. through an engineered retrovirus) and each cell makes its own supply of transcription-blocking protein.
Genetic diseases (including cancer) involve: a) the inappropriate expression of a message (e.g. expression of an oncogene), b) the expression of an incorrect message (e.g. sickle-cell disease or cystic fibrosis), or c) the non-expression of a message. The present invention is applicable to genetic diseases that involve excessive inappropriate expression of a gene. The mode of treatment is similar to that for viral diseases. That is, transcription-blocking proteins may be used to prevent expression of an incorrect message or excessive expression of a correct one.
Novel DNA-binding proteins are useful in biotechnology because they allow control of gene expression. DNA-binding proteins that respond to novel chemical or physical signals allow greater freedom in design of strains. Koob et al (KOOB88) have used sequence-specific DNA-binding proteins to modulate the action of restriction enzymes. The availability of a variety of DNA-binding proteins, each having a different specificity, would augment the utility of this method.
In both prokaryotes and eukaryotes, proteins having affinity for specific sites on DNA modulate transcriptional expression of genes. Through direct interaction with DNA at specific sites in genes, certain proteins called repressors hinder transcription by making the DNA inaccessible to RNA polymerase. Other DNA-binding proteins and some multi-functional repressors are activators which allow RNA polymerase to initiate transcription with increased efficiency.
DNA-binding proteins have been studied to determine in atomic detail how these proteins actually contact the DNA molecule and interact with it to influence gene expression. The best known are a group of proteins primarily studied in prokaryotes that contain the structural motif alpha-helix-turn-alpha-helix (H-T-H) (PABO84, AGGA88). These proteins bind as dimers or tetramers to DNA at specific operator sequences that have approximately palindromic sequences. Contacts made by two adjacent alpha helices of each monomer in and around two sites in the major groove of B-form DNA are a major feature in the interface between DNA and these proteins. Proteins that bind in this manner share sequence similarity in the H-T-H region but vary in the extent of similarity in other regions. This group of proteins includes the temperate bacteriophage repressor proteins and Cro proteins, bacterial metabolic repressor proteins such as GalR, LacI, LexA, and TrpR, bacterial activator protein CAP and dual activator/repressor protein AraC, bacterial transposon and plasmid TetR proteins (PABO84, AGGA88), the yeast mating type regulator proteins MATa1 and MATalpha2 (MILL85) and eukaryotic homeo box proteins (EVAN88).
Interactions between dimeric repressors and approximately palindromic operators have usually been discussed in the literature with attention focused on one half of the operator with the tacit or explicit assumption that identical interactions occur in each half of the complex. Departures from palindromic symmetry allow proteins to distinguish among multiple related operators (SADL83, SIMO84). One must view the DNA-protein interface as a whole. The emphasis in the literature on dyad symmetry is a barrier to determining the requirements for general novel recognition of DNA by proteins.
While single crystals of short segments of DNA, of DBPs, and of DNA-DBP complexes have all been studied by X-ray diffraction and other analytical techniques, it is not yet possible to design a protein to bind strongly and specifically to an arbitrary DNA sequence. As taught by the present invention, however, it is possible to use theoretical considerations to postulate a family of potential DBP mutants and identify one having the desired specificity by other means.
The simple antiparallel double helical idealization of B-DNA having 10.5 regularly spaced base pairs per turn, a rise of 3.4 Angstroms (.ANG.) per base pair, and a helical diameter of 19.0 .ANG., though still an invaluable generalization, does not fit the details of 3D structures of specific DNA sequences. It has been observed that AT base pairs are wound more tightly than GC pairs (DICK83). An AT base pair may advance the helix by as little as 29.degree. (ANDE87) while a GC pair may advance it by as much as 45.degree.; the average for random DNA is about 36.degree./base pair (SAEG83).
Physical and theoretical methods indicate that the flexibility of DNA as well as the equilibrium geometry are a function of the sequence. Hogan and Austin have suggested that AT-rich segments of DNA are more flexible than GC-rich regions and they have resolved the flexibility into bending and torsion (HOGA87). Gartenberg and Crothers (GART88) have shown that certain aspects of DNA flexibility a) are best stated in terms of sections comprising two base pairs, and b) are directional in nature: A or T favors bending toward the minor groove while G or C favors bending toward the major groove. Ulanovsky and Trifonov (ULAN87) have reviewed the significant static effect that AA dinucleotides can induce on the curvature of the DNA axis.
DNA is strongly negatively charged due to the ionization of the phosphate groups. In solution at or near physiological pH, cations become localized between phosphate groups in the minor groove (OHLE85). Ulanovsky and Trifonov (ULAN87) suggest that electrostatic interactions between DNA and proteins will significantly affect the twist, tilt and roll of bases. Entrapped water molecules and ions may also mediate the interaction between the protein and the DNA, as in the case of the binding of Trp repressor (TrpR) to its operator (OTWI88).
Matthews (MATT88), commenting on the current collection of protein-DNA structures, concludes that: a) different H-T-H DBPs use their recognition helices differently, b) there is no simple code that relates particular base pairs to particular amino acids at specific locations in the DBP, and c) "full appreciation of the complexity and individuality of each complex will be discouraging to anyone hoping to find simple answers to the recognition problem."
Other prokaryotic repressors exist that have little or no sequence homology to H-T-H binding proteins and have no H-T-H binding motif. Binding of operators with approximate palindromic sequence symmetry is observed among some proteins of this group, such as Salmonella typhimurium bacteriophage P22 Mnt protein (VERS87a) and E. coli TyrR repressor protein (DEFE86). Others of this group bind to operator sequences that are partially symmetric (S. typhimurium phage P22 Arc protein, VERS87b; E. coli Fur protein, DEL087; plasmid R6K pi protein, FILU85) or non-symmetric (phage Mu repressor, KRAU86).
Recently, Rafferty et al. (RAFF89) have determined the crystal structures of E. coli MetJ repressor, MetJ complexed to S-adenosylmethionine (SAM), and the ternary complex of MetJ-SAM-DNA where the DNA contains the cognate binding sequence for MetJ. This protein binds as a dimer to a short palindromic or nearly palindromic DNA sequences. The consensus sequence is 5'-AGACGTCT-3', called a Met Box. Genes regulated by MetJ have two to five tandem Met Boxes and Phillips et al. (PHIL89) have shown that the binding of MetJ is cooperative. MetJ is distinguished from H-T-H proteins in that the recognition portion of the protein consists of two symmetry-related extended strands that form a beta ribbon that can fit into the major groove of B-DNA. Robert Sauer reports that P22 Mnt and P22 Arc are thought to have structures related to MetJ. Mnt and Arc are thought to bind to DNA as tetramers. A tightly entwined Arc dimer binds to one half site, which has an internal approximate diad. Dimers touch each other so that binding to adjacent half sites is cooperative.
Several eukaryotic transcriptional activation proteins of eukaryotes have been identified which bind to specific DNA sequences; however, there is only limited information on the DNA-binding motif of these proteins.
The ease and rapidity of genetic analysis in prokaryotes has enabled extensive mutational analysis of prokaryotic DNA-binding proteins and their specific nucleic acid sequences, producing a wealth of information on the relationships between structure and function in the molecular complexes. Altered protein or nucleic acid sequences have been obtained via a variety of mutagenesis techniques. Mutant proteins and operators produced by these techniques have been used alone or in combination with native or mutant sequences and proteins to determine relationships among sequence, structure, and binding in protein-DNA complexes.
Mutations that alter the amino acid sequence of a protein show an enormous variety of effects ranging from no observable changes in structure or function to complete loss of function, protein destabilization, and degradation. A general conclusion from the mutational studies of the DNA-binding proteins is that mutations in protein sequence that result in the decrease or loss of protein function fall into two large (and overlapping) classes: 1) those mutations which have destabilizing effects on global protein structure or folding and 2) those which affect protein function by specifically altering protein-protein (in the case of dimerization or activation) or protein-DNA (in the case of DNA-binding) interactions. Analysis of the first class of mutations provides information on the general problem of what factors determine protein folding and stability, while analysis of the second class serves to define the surfaces and interactions involved in the formation and stabilization of the protein-DNA complex. A third class of mutations that yields information concerning protein-DNA interactions is that of mutant operator sequences.
Reidhaar-Olson and Sauer (REID88) have extensively studied amino acid substitutions at positions 84 to 91 in alpha helix 5 of lambda repressor. The authors used cassette mutagenesis to vary residues at two or three positions through all twenty amino acids simultaneously and selected for those combinations that resulted in normal functional N-terminal domains. The authors neither discuss optimization of the number or positions of residues to vary to obtain any particular functionality, nor do they attempt to obtain proteins having alternate dimerization or recognition functions.
Pakula et al. (PAKU86) have used random mutagenesis to generate a large number of altered lambda Cro proteins containing single missense mutations (see Table 1). Twenty mutations were recovered in residues proposed to interact in dimer formation (L7, L23, A34, I40, E54, V55, K56, F58). (We use the single-letter code for amino acids, shown in Table 9. Mutants are indicated by: a) the amino acid of the parent, b) the new amino acid, and c) the position in the protein. Thus FV58 indicates a change from phenylalanine (F) to valine (V) at position 58.) Proteins with substitutions at K56 (which may interact with DNA directly (PAKU86, TAKE86)) do not bind DNA and are present in the cell. All other mutations recovered in these residues drastically decrease both the in vivo levels of the altered proteins and the binding.
DNA-binding proteins bind DNA with sequence-specific and sequence-independent interactions. Sequence-independent interactions are thought to occur via electrostatic interactions between the sugar phosphate backbone of the DNA and peptide NH groups or the side groups of appropriately charged or H-bonding residues (viz. N, Q, K, R, T, Y, H, E, D, and S) on the surface of the protein (ANDE87, LEWI83, and TAKE85). Sequence-specific interactions involve H-bonding, nonpolar, or van der Waals contacts between surface exposed residue side groups or peptide NH groups of the polypeptide main chain and base pair edges exposed in the major and minor grooves of the DNA at the binding site. Both non-specific and specific interactions contribute to the binding energy at the binding site. In addition, the long range electrostatic interactions involved in some sequence-independent bonds can kinetically facilitate site-specific binding by allowing the protein to rapidly bind weakly to DNA and then to diffuse long distances along the DNA strand (KIMJ87, TAKE86). This mechanism greatly accelerates the process of protein diffusion to binding sites by reducing the protein search from three dimensions to one dimension (KIMJ87). In the non-specific protein-DNA complex, the protein is thought to be displaced less than 4 A outward from the DNA axis and to lack the major groove contacts (TAKE85, TAKE86). Thus, many sequence-independent interactions (e.g. phosphate-to-peptide NH H-bonds) are possible only in the specific configuration achieved at the binding site. In addition, some interactions change from sequence-independent to sequence-specific in the transition from the non-specific complex to the specific complex at the binding site (e.g. R41 and R43 in 434 Repressor (ANDE87)).
Mutations that alter residues involved in specific binding interactions with DNA have been identified in a number of prokaryotic DNA-binding proteins including lambda, 434, and P22 repressor and Cro proteins, P22 Arc and Mnt, and E. coli trp and lac repressors and CAP. In general, these mutations occur in residues that are exposed to solvent in the free protein but buried in the protein-DNA complex and result in relatively stable expressed proteins.
A few cases have been reported (BASS88, YOUD83, VERS85a, CARU87, WHAR87, and EBRI84) in which a change in a single residue in a DNA-binding protein not only abolishes binding by the protein to the wild-type operator but also confers strong binding to a different operator.
Youderian et al. (YOUD83) and Vershon et al. (VERS85a) have described the isolation and binding characteristics of an altered P22 Mnt repressor (Mnt-bs) which recognizes an altered operator. Mnt-bs binds tightly to a symmetrically altered operator (mA/mA operator) in which the base pairs at the positions 3 and 15 are changed from G:C pairs to A:T pairs. The host cells produce dam methylase so that the adenines at operator positions are methylated. In vitro, Mnt-bs binds as tightly to the mA/mA operator as the wild-type protein binds to the G/G operator. In addition, wild-type repressor binds to the mA/mA operator 1000-fold less well than to the wild-type operator while Mnt-bs shows an identical 1000-fold decrease in binding to the G/G operator relative to the mA/mA operator. Mnt-bs does not bind to the unmethylated A/A operator isolated from dam.sup.- cells (VERS85a). Thus, the altered recognition of Mnt-bs involves the major groove N6-methyl groups of the adenines at positions 3 and 15 of the mA/mA operator.
At pH lower than 8.5, wild-type P22 Mnt shows a strong pH dependence for binding to the G/G operator while Mnt-bs binding to the mA/mA operator shows relatively little pH dependence (VERS85a). These observations are consistent with the proposal (VERS85a) that H6 acts as a hydrogen donor to O6 or N7 of the guanines at positions 3 and 15 of the wild-type operator. Thus, the change in specificity shown by Mnt-bs relative to wild-type Mnt results from the replacement of one set of contacts (H-bonds between H6 of P22 Mnt and guanine 3 or 15) with an energetically equivalent set (hydrophobic interactions between P6 of Mnt-bs and the N6 methyl groups of methylated adenines at operator positions 3 or 15).
A similar example of a single residue change producing an altered protein which recognizes a different operator has been described using lambda Cro binding to lambda O.sub.R 1 (CARU87). In the wild-type Cro-wild-type O.sub.R 1 complex, Q27 is believed to form H-bonds with N6 and N7 of adenines at positions 2 or 16 in O.sub.R 1. Based on computer modeling predictions, Caruthers et al. (CARU87) replaced the adenines at positions 2 and 16 in O.sub.R 1 with thymine (to make O.sub.R 1*) and constructed altered Cro repressors having C, L, V, I, and G in place of Q at position 27. The O.sub.R 1-to-O.sub.R 1* change results in a more than 40-fold decrease in wild-type Cro binding. The QC27, QL27, QV27, and QI27 changes all produce proteins that bind to O.sub.R.sup.1 * as well as wild-type Cro binds to O.sub.R 1. QG27 Cro does not bind well to either operator. The QC27 mutation reduces binding of the altered Cro to O.sub.R 1 8-fold relative to wild-type binding, while the QI27, QL27, and QV27 substitutions produced proteins that bind to O.sub.R 1 almost as well as the wild-type Cro. Thus, the larger hydrophobic amino acid substitutions (I, L, V) result in proteins with a loss of specificity in operator binding (A or T at positions 2 and 16 are acceptable) perhaps due to an interaction between the large side groups and thymine methyl groups across the major groove. The QC27 substitution produces an altered Cro which distinguishes between O.sub.R 1 and O.sub.R 1* and binds to O.sub.R 1* with the same affinity as the wild-type Cro repressor binds to O.sub.R 1.
Wharton and Ptashne (WHAR85b) have described the construction of an altered 434 repressor with altered operator binding properties. These authors show that any single base pair change in the outer 8 positions (1-4, 11-14) of a synthetic 434 operator reduces operator binding by wild-type 434 repressor more than 150-fold. The single QA28 change in 434 repressor produces an altered repressor which cannot bind to the wild-type operator. QA28 repressor binds to an altered operator, 1T, in which thymidine has been substituted for adenine at the symmetrically located 1 and 14 positions. Binding of QA28 repressor to operator IT is almost as strong as binding of wild-type repressor to wild-type operator. QA28 repressor does not bind to operators 1C, 1G, or 1U and binds with 50-fold lower affinity to an operator in which 5-methylcytosine replaces adenine at positions 1 and 14. Molecular modeling suggests that the change in specificity results from the substitution of a single hydrophobic interaction (van der Waals contact between the A28 side group methyl and the 5 methyl group of thymidine) in the QA28-repressor-operator-1T complex for the two hydrogen bonds (between N and O of Q28 and N7 and N6 of adenine) in the wild-type repressor-wild-type operator complex. The reduced binding of QA28 repressor to the 5-methyl-C operator may result from a slight misalignment of the protein and DNA methyl groups (WHAR87).
Ebright et al. (EBRI84, EBRI87) have described the isolation of three mutations that alter sequence-specificity in cAMP receptor protein (CRP) of Escherichia coli, also known as catabolite activator protein (CAP). The altered proteins show specificity for A:T base pairs at the symmetrically located positions 7 and 16 in the operator rather than for the G:C pairs required for binding by wild-type CRP. Model building suggests that the three missense mutations, each of which changes residue 181 (EK181, EL181, EV181), produce changes in major groove contacts between protein side groups and DNA base pairs in the operator. The H-bond between N4 of cytosine and an oxygen of Q181 present in the wild-type complex is replaced in the altered complexes by hydrophobic interactions between K181, L181 or V181 methyl groups and the thymine methyl group. In addition, K181 can form a H-bond with the thymidine O4 atom.
Spiro and Guest (SPIR88) have changed the DNA-binding properties of E. coli FNR protein to be very similar to those of E. coli CRP by changing three residues in the recognition helix. FNR is a protein of known sequence but unknown 3D structure that has significant sequence similarity to CRP and is involved in turning on genes needed in anaerobic conditions. Shaw et al. (SHAW83) suggest that FNR is similar to CRP in that it can bind its cognate operator only when an effector molecule, as yet unknown, is bound to FNR. The sequence similarity between FNR and CRP suggests that the 3D structure of FNR may be similar to CRP. The residues of FNR that correspond to the cAMP-contacting residues of CRP are different from those of CRP.
In all of the examples cited above, alteration of binding specificity has been accomplished by using symmetrically-located pairs of alterations in the operator sites and repressor DNA-binding regions. Single, asymmetric changes or multiple changes asymmetrically located in either the binding protein or its operator were not considered.
The class of DNA-binding mutations that change protein recognition includes the "helix swap" constructions (WHAR84, WHAR85b, WHAR85a, SPIR88). In these altered proteins, multiple mutations are introduced into the DNA-binding recognition helix of H-T-H proteins with the goal of changing the operator specificity of one protein to that of another.
Wharton et al. (WHAR84) have described an experiment in which they introduced five site specific changes (EQ32, QL33, LI34, NA36, and KV38) in alpha 3 (recognition helix) of 434 repressor. The resulting alpha 3 was identical in sequence to the alpha 3 of 434 Cro. In DMS methylation protection experiments, the altered repressor and wild-type Cro have effects on operator purine methylation which are identical to each other and different from wild-type repressor. The relative affinities of the altered repressor for 434 operator sites O.sub.R 1, O.sub.R 2 and O.sub.R 3 are intermediate between those of wild-type 434 repressor and 434 Cro, although the overall affinity of the hybrid molecule for operator DNA is reduced. Wharton et al. (WHAR84) also reported that the converse helix swap experiment in which the recognition helix of 434 Cro was replaced with that of 434 repressor (except for I34) produces a hybrid protein which protects host cells against infection by 434 phage.
In a second set of helix swap experiments, Wharton and Ptashne (WHAR85b) introduced changes in the solvent-exposed residues of 434 repressor alpha 3 (TR27, QN28, QV29, ES32, NR36) to produce a hybrid repressor protein having the solvent exposed alpha 3 surface of P22 repressor while the remainder of the protein (including the buried alpha 3 surface) was identical to 434 repressor. When overexpressed, the hybrid repressor protects host cells from lambda immP22 phages but not from lambda imm434 phages. In addition, the overexpressed hybrid is trans-dominant, suggesting that heterodimers of altered and wild-type 434 repressor monomers form but are non-functional.
Wild-type P22 and 434 repressors do not bind to each other's operators in vitro and the hybrid 434(P22 recognition) repressor protein does not bind to 434 operator sequences in vitro (WHAR85b). DNase I protection experiments show that the hybrid protein binds to and protects P22 O.sub.R 1, O.sub.R 2, and O.sub.R 3 with the same relative affinities as wild-type P22 repressor, although the absolute affinities are reduced about 10-fold relative to the wild-type protein.
Wharton and Ptashne (WHAR85b) have reported that P22/434 hybrid repressor proteins in which substitutions with P22 residues are limited to the six to seven N-terminal residues of alpha 3 can protect host cells against infection with lambda immP22 phages but not against lambda imm434 phages. In contrast, hybrid repressors with substitutions of P22 residues in the C-terminal 5 to 6 residues of alpha 3 protect host cells from infection with lambda imm434 phages but not from infection with lambda immP22 phages. Thus, sequence- specific recognition by 434 repressor appears to be confined to the N-terminal half of alpha 3.
(WHAR85a) has reported that recognition helix swap experiments between 434 repressor and lambda repressor, lambda Cro, and CAP produce hybrid proteins which are non-functional in vivo and in vitro. Lambda repressor, Cro and CAP recognize larger operators than 434 repressor and, in addition, may employ different binding orientations (CAP) (PAB084, WHAR85a), or additional binding interactions (e.g. lambda repressor and Cro N- and C-terminal arms) to stabilize the bound structure. The greater than 40-fold decrease in operator affinity due to alpha 2 substitutions in Cro67 (BUSH88) further emphasizes the importance of regions outside the recognition helix to repressor specificity and binding.
An extension of the "helix swap" experiments uses a mixture of 434 repressor and 434R[alpha3(P22R)](HOLL88). This mixture recognizes and binds in vitro with high affinity to a 16 bp chimeric operator consisting of a 434 half-site and a P22 half-site, indicating that active heterodimers are formed. The authors did not extend the results to in vivo cellular repression, nor did they perform mutagenesis of the repressors and selection of cells to create novel recognition patterns.
Comparison of the results of the helix swap experiments with the results of the single residue change experiments described previously highlights the importance of protein structure outside the recognition helix to binding affinity. While the altered proteins produced from single residue changes recognize the altered operators with high affinities, the hybrid proteins produced in helix swap experiments recognize their operators with reduced affinities relative to the wild-type "helix donors". In the case of a single site-specific alteration in alpha 3, the altered protein recognizes the new operator in the context of a complex in which wild-type protein structure is conserved. In helix swap experiments in which both monomers contain altered recognition helices, the recognition helix of one protein interacts with its operator in the context of a framework provided by a different protein. The less than ideal three dimensional conformation imposed by the host protein can reduce the affinity of the hybrid for the operator or, as is the case for lambda repressor, lambda Cro or CAP recognition helices in 434 repressor, abolish binding altogether. In contrast, the heterodimeric repressor described by Hollis et al. (HOLL88) recognizes the chimeric operator nearly as well as the wild type 434 repressor protein recognizes its operator. This requires that the heterodimer is sufficiently flexible to allow the adjustments needed for optimal interactions at both half-sites.
Hollis et al. (HOLL88) have shown that heterodimers of two highly similar DBPs bind in vitro to a chimeric operator having no sequence symmetry. Hollis et al. mixed equal quantities of 434 repressor and "helix-swapped" 434 repressor bearing the alpha 3 helix of P22 repressor, to form the mixed dimer.
With the exception of Hollis et al. (HOLL88), all of the helix swap experiments described involve the creation, through direct substitutions of known binding sequences, of symmetrical homodimers of hybrid repressor monomers which interact with known operators having some degree of symmetry. None of these studies, including that of Hollis et al., discuss binding to completely novel non-symmetric operator sites via proteins containing two different recognition sites, nor construction of novel DNA binding regions by simultaneous variations of sets of residues on the protein surface.
The recently developed techniques of "reverse genetics" have been used to produce single specific mutations at precise base pair loci (OLIP86, OLIP87, and AUSU87). Mutations are generally detected by sequencing and in some cases by loss of wild-type function. These procedures allow researchers to analyze the function of each residue in a protein (MILL88) or of each base pair in a regulatory DNA sequence (CHEN88). In these analyses, the norm has been to strive for the classical goal of obtaining mutants carrying a single alteration (AUSU87).
Reverse genetics is frequently applied to coding regions to determine which residues are most important to the protein structure and function. In such studies, isolation of a single mutant at each residue of the protein and determination of the phenotype conferred gives an initial estimate of which residues play crucial roles.
Prior to the invention of Ladner and Guterman (U.S. Ser. No. 07/240,160 abandoned), two general approaches have been developed to create novel mutant proteins through reverse genetics. Both methods start with a clone of the gene of interest. In one approach, dubbed "protein surgery" (reviewed by Dill, (DILL87)), a specific substitution is introduced at a single protein residue by a synthetic method using the corresponding natural or synthetic cloned gene. Craik et al. (CRAI85), Roa et al. (RAOS87), and Bash et al. (BASH87) have used this approach to determine the effects on structure and function of specific substitutions in trypsin.
The other approach has been to generate a variety of mutants at many loci within the cloned gene, the "gene-directed random mutagenesis" method. The specific location and nature of the change or changes are determined by DNA sequencing. It may be possible to screen for mutations if loss of a wild-type function confers a cellular phenotype. Using immunoprecipitation, one can then differentiate among mutant proteins that: a) fold but fail to function, b) fail to fold but persist, and c) are degraded, perhaps due to failure to fold. This approach is exemplified by the work of Pakula et al. (PAKU86) on the effect of point mutations on the structure and function of the Cro protein from bacteriophage lambda. This approach is limited by the number of colonies that can be examined. An additional important limitation is that many desirable protein alterations require multiple amino acid substitutions and thus are not accessible through single base changes or even through all possible amino acid substitutions at any one residue.
The objective in both the surgical and gene-directed random mutagenesis approaches has been, however, to analyze the effects of a variety of single substitution mutations, so that rules governing such substitutions could be developed (ULME83). Progress has been greatly hampered by the extensive efforts involved in using either method and the practical limitations on the number of colonies that can be inspected (ROBE86).
The term "saturation mutagenesis" with reference to synthetic DNA is generally taken to mean generation of a population in which: a) every possible single-base change within a fragment of a DNA coding or regulatory region is represented, and b) most mutant genes contain only one mutation. Thus a set of all possible single mutations for a 6 base-pair length of DNA comprises a population of 18 mutants. Oliphant et al. (OLIP86) and Oliphant and Struhl (OLIP87) have demonstrated ligation and cloning of highly degenerate oligonucleotides and have applied saturation mutagenesis to the study of promoter sequence and function. They have suggested that similar methods could be used to study genetic expression of protein coding regions of genes, but they do not say how one should: a) choose protein residues to vary, or b) select or screen mutants with desirable properties.
Ward et al. (WARD86) have engineered heterodimers from homodimers of tyrosyl-tRNA synthetase. Methods of converting homodimeric DBPs into heterodimeric DBPs are disclosed in the present invention. Creighton (CREI84, p263-264) has reviewed cases in which gene duplication and evolution have produced single-polypeptide proteins with approximate dyad symmetry despite very low internal sequence homology. Methods of deriving single-polypeptide pseudo-dimeric DBPs from homodimeric DBPs are disclosed in the examples of the present invention. An example of naturally occurring heterodimer binding to a non-palindromic site consisting of two naturally occurring half-sites is found in the yeast MATalalpha2 protein. The DNA site that it recognizes consists of two unlike half-sites, and each of these are found in full palindromes at other yeast loci.
Benson et al. (BENS86) have developed a scheme to detect genes for sequence-specific DNA-binding proteins that utilizes the immI region of phage P22. They have demonstrated that five different operators can function at the same site to repress transcription when the appropriate DNA-binding protein is present. They do not consider non-symmetric target DNA sequences nor do they suggest mutagenesis to generate novel DNA-binding properties. Their method is presented as a method to detect genes for naturally occurring DNA-binding proteins. Because the selective system is lytic growth of phage, low levels of repression can not be detected. Selective chemicals, as disclosed in the present application, on the other hand, can be finely modulated so that low level repression is detectable.
Adhya and Gottesman (ADHY82) describe the phenomenon of promoter occlusion in which frequent transcription from a strong promoter prevents transcription from a nearby, opposed weaker promoter. When a DBP represses the strong promoter, the occlusion is relieved. Elledge and Davis (ELLE89a) investigated the mechanism of occlusion and the effects of placement of operator relative to the strong promoter and the effect of promoter strength.
Elledge et al. (ELLE89b) have described a genetic system for selecting preexisting eukaryotic DNA-binding proteins. This system comprises a low-copy-number plasmid that carries: a) the aadA gene with its weak endogenous promoter, b) a strong promoter, PconII that directs transcription in opposition to aadA, and c) a DNA binding site downstream from P.sub.conII. They contemplate introducing a cDNA library comprising possible DNA-binding proteins on a second compatible plasmid. Their system is presented as a tool for cloning pre-existing DBPs from cDNA libraries and there is no mention of variegation of the gene that encodes the potential DBP. They did not report screening an actual library, rather they demonstrated feasibility of their selection by using a model library having only two components, one of which was known to encode a DNA-binding protein of interest and the other known to encode no DNA-binding protein at all. They did not contemplate the possibility that proteins would bind to P.sub.conII and use only one copy of the target DNA. There is no discussion of the symmetry of the target sequence or of the symmetry of the DBP.
Ladner and Bird, WO88/06601, published 7 Sep. 1988 and claiming priority from a U.S. application filed 2 Mar. 1987, suggest strategies for the preparation of asymmetric repressors. In one embodiment, a gene is constructed that encodes, as a single polypeptide chain, the two DNA-binding domains of a naturally-occurring dimeric repressor, joined by a polypeptide linker that holds the two binding domains in the necessary spatial relationship for binding to an operator. While they prefer to design the linker based on protein structural data (cf. Ladner, U.S. Pat. No. 4,704,692) they state that uncertainties in the design of the linker may be resolved by generating a family of synthetic genes, differing in the linker-encoding subsequence, and selecting in vivo for a gene encoding the desired pseudo-dimer. Ladner and Bird do not consider the background of false positives that would arise if the two-domain polypeptides dimerize to form pseudo-tetramers.
The binding of lambdoid repressors, Cro and CI repressor, is taken, in WO88/06601, as canonical even though other DBPs were known having operators of different lengths. WO88/06601 maintains that the 17 bp lambdoid operators can be divided into three regions: a) a left arm of five bases, b) a central region of seven bases, and c) a right arm of five bases. Several other DBPs are known for which this division is inappropriate. Further, WO88/06601 states that the sequence and composition of the central region, in which edges of bases are not contacted by the DBP, are immaterial. There is direct evidence for 434 repressor (KOUD87) that the sequence and composition of the central region strongly influences binding of 434 repressor.
Once a pseudo-dimer is obtained, they then obtain an asymmetric pseudo-dimer by the following technique. First, the user of WO88/06601 is directed to construct a family of hybrid operators in which the sequence of the left and right arms are specified; no specification is given for the central seven bases. In each member of the family, the left arm contains the same sequence as the wild-type operator left arm while the right arm 5-mer is systematically varied through all 1024 possibilities. Similarly, in the gene encoding the pseudo-dimer, the codons for one recognition helix have the wild-type sequence while the codons coding for the other recognition helix are highly varied. The variegated pseudo-dimer genes are expressed in bacterial cells, wherein the hybrid operators are positioned to repress a single highly deleterious gene. Thus, it is supposed that one can identify a recognition helix for each possible 5-mer right arm of the operator by in vivo selection; the correspondences between 5-mer right arms and sequences of recognition helices are compiled into a dictionary. The consequences of mutations or deletions in the deleterious genes are not considered. WO88/06601 suggests that successful constructions may be very rare, e.g. one in 10.sup.6, but ignore other genetic events of similar or greater frequency.
To obtain a repressor for an arbitrary 17-mer operator, the user of WO88/06601:
a) finds the 5-mer sequence of the left arm in the dictionary and uses the corresponding recognition helix sequence in the first DNA-binding domain of the pseudo-dimer,
b) ignores the sequence and composition of the next seven bases, and
c) finds the 5-mer sequence of the right arm in the dictionary and uses the corresponding recognition helix sequence in the second DNA-binding domain of the pseudo-dimer.
WO88/06601 also envisions means for producing a heterodimeric repressor. A plasmid is provided that carries genes encoding two different repressors. A population of such plasmids is generated in which some codons are varied in each gene. WO88/06601 instructs the user to introduce very high levels of variegation without regard to the number of independent transformants that can be produced. WO88/06601 also instructs the user to introduce variegation at widely separated sites in the gene, though there is no teaching concerning ways to simultaneously introduce high levels of variegation at widely separated sites in the gene or concerning maintenance of diversity without selective pressure, as would be needed if the variegation were introduced stepwise. WO88/06601 teaches that codons thought to be involved in the protein-protein interface should be preferentially mutated to generate heterodimers. Cells transformed with this population of plasmids will produce both the desired heterodimer and the two "wild-type" homodimers. WO88/06601 advises that one select for production of the heterodimer by providing a highly deleterious gene controlled by a hybrid operator, and beneficial genes controlled by the wild-type operators. The fastest growing cells, it is taught, will be those that produce a great deal of the heterodimer (which blocks expression of the deleterious gene) and little of the homodimers (so that the beneficial genes are more fully expressed). There is no consideration of mutations or deletions in the deleterious gene or in the wild-type operators; such mutations will produce a background of fast-growing cells that do not contain the desired heterodimers.
Anderson et al. (ANDE88) have obtained monoclonal antibodies specific to particular sequences of dsDNA. The recognition is specific to about three base pairs and is not appropriate to the uses foreseen in the present invention. Furthermore, delivery of antibodies into cells is more difficult than is delivery of the DBPs disclosed herein. In addition, intracellular expression of antibodies is unlikely to produce functional molecules because the disulfide bonds will not oxidize intracellularly.
LADNER, U.S. Pat. No. 4,704,692, "Computer Based System and Method for Determining and Displaying Possible Chemical Structures for Converting Double- or Multiple-Chain Polypeptides to Single-Chain Polypeptides" (LADNER '692) describes a design method for converting proteins composed of two or more chains into proteins of fewer polypeptide chains, but with essentially the same 3D structure. There is no mention of variegated DNA and no genetic selection. LADNER is named as co-inventor of U.S. Pat. No. 4,853,871, "Computer-Based Methods for Designing Stabilized Proteins" (Pantoliano '871), issued to Michael W. PANTOLIANO and Robert C. LADNER on 1 Aug. 1989. Pantoliano '871 describes a computer-based method of determining residues within a protein of known 3D structure at which cysteines may be substituted in such a way that a more stable protein is likely to result. There is no mention of variegated DNA, genetic selection, or creation of novel DNA-altering enzymes. Both '692 and '871 are assigned to Genex Corporation.
LADNER, Glick and Bird, WO88/06630, published 7 Sep. 1988, and claiming priority from a U.S. application filed 2 Mar. 1987, relates to the preparation of "single chain antibodies." The present invention, on the other hand, is directed to the preparation of non-immunoglobulin proteins which bind to DNA, and particularly those which affect gene expression. A cell containing a gene encoding a "single chain antibody" expresses that gene and displays the antibody on the cell surface as a domain of a fusion protein. It is suggested that a diverse population of antibody domains may be obtained by varying the sequence of the DNA encoding the domain by mutation techniques. Cells displaying antibody domains which bind the antigen of interest are selected. There is no teaching as to where to mutate the gene, and selection is by extracellular binding of a surface-displayed domain to an immobilized extracellular antigen.
No statement of the present application is to be construed as an admission of the scope and content of the prior art or of the pertinency of that art. Where the referenced work is by another, the discussion is based solely on the published description and no admission is made that the work was performed as described. The dates given for the references are the nominal dates given in the cited work and may not correspond to the true publication dates under applicable law. All references cited in this specification are hereby incorporated by reference.