1. Field of the Invention
This invention relates to development of novel binding proteins (including mini-proteins) by an iterative process of mutagenesis, expression, chromatographic selection, and amplification. In this process, a gene encoding a potential binding domain, said gene being obtained by random mutagenesis of a limited number of predetermined codons, is fused to a genetic element which causes the resulting chimeric expression product to be displayed on the outer surface of a virus (especially a filamentous phage) or a cell. Chromatographic selection is then used to identify viruses or cells whose genome includes such a fused gene which coded for the protein which bound to the chromatographic target.
2. Information Disclosure Statement
A. Protein Structure
The amino acid sequence of a protein determines its three-dimensional (3D) structure, which in turn determines protein function (EPST63, ANFI73). Shortle (SHOR85), Sauer and colleagues (PAKU86, REID88a), and Caruthers and colleagues (EISE85) have shown that some residues on the polypeptide chain are more important than others in determining the 3D structure of a protein. The 3D structure is essentially unaffected by the identity of the amino acids at some loci; at other loci only one or a few types of amino acid is allowed. In most cases, loci where wide variety is allowed have the amino acid side group directed toward the solvent. Loci where limited variety is allowed frequently have the side group directed toward other parts of the protein. Thus substitutions of amino acids that are exposed to solvent are less likely to affect the 3D structure than are substitutions at internal loci. (See also SCHU79, p169-171 and CREI84, p239-245, 314-315).
The secondary structure (helices, sheets, turns, loops) of a protein is determined mostly by local sequence. Certain amino acids have a propensity to appear in certain "secondary structures," they will be found from time to time in other structures, and studies of pentapeptide sequences found in different proteins have shown that their conformation varies considerably from one occurrence to the next (KABS84, ARGO87). As a result, a priori design of proteins to have a particular 3D structure is difficult.
Several researchers have designed and synthesized proteins de novo (MOSE83, MOSE87, ERIC86). These designed proteins are small and most have been synthesized in vitro as polypeptides rather than genetically. Hecht et al. (HECH90) have produced a designed protein genetically. Moser, et al. state that design of biologically active proteins is currently impossible.
B. Protein Binding Activity
Many proteins bind non-covalently but very tightly and specifically to some other characteristic molecules (SCHU79, CREI84). In each case the binding results from complementarity of the surfaces that come into contact: bumps fit into holes, unlike charges come together, dipoles align, and hydrophobic atoms contact other hydrophobic atoms. Although bulk water is excluded, individual water molecules are frequently found filling space in intermolecular interfaces; these waters usually form hydrogen bonds to one or more atoms of the protein or to other bound water. Thus proteins found in nature have not attained, nor do they require, perfect complementarity to bind tightly and specifically to their substrates. Only in rare cases is there essentially perfect complementarity; then the binding is extremely tight (as for example, avidin binding to biotin).
C. Protein Engineering
"Protein engineering" is the art of manipulating the sequence of a protein in order to alter its binding characteristics. The factors affecting protein binding are known, (CHOT75, CHOT76, SCHU79, p98-107, and CREI84, Ch8), but designing new complementary surfaces has proved difficult. Although some rules have been developed for substituting side groups (SUTC87b), the side groups of proteins are floppy and it is difficult to predict what conformation a new side group will take. Further, the forces that bind proteins to other molecules are all relatively weak and it is difficult to predict the effects of these forces.
Recently, Quiocho and collaborators (QUIO87) elucidated the structures of several periplasmic binding proteins from Gram-negative bacteria. They found that the proteins, despite having low sequence homology and differences in structural detail, have certain important structural similarities. Based on their investigations of these binding proteins, Quiocho et al. suggest it is unlikely that, using current protein engineering methods, proteins can be constructed with binding properties superior to those of proteins that occur naturally.
Nonetheless, there have been some isolated successes. Wilkinson et al. (WILK84) reported that a mutant of the tyrosyl tRNA synthetase of Bacillus stearothermophilus with the mutation Thr.sub.51 .fwdarw.Pro exhibits a 100-fold increase in affinity for ATP. Tan and Kaiser (TANK77) and Tschesche et al. (TSCH87) showed that changing a single amino acid in mini-protein greatly reduces its binding to trypsin, but that some of the mutants retained the parental characteristic of binding to an inhibiting chymotrypsin, while others exhibited new binding to elastase. Caruthers and others (EISE85) have shown that changes of single amino acids on the surface of the lambda Cro repressor greatly reduce its affinity for the natural operator O.sub.R 3, but greatly increase the binding of the mutant protein to a mutant operator. Changing three residues in subtilisin from Bacillus amyloliquefaciens to be the same as the corresponding residues in subtilisin from B. licheniformis produced a protease having nearly the same activity as the latter subtilisin, even though 82 amino acid sequence differences remained (WELL87a). Insertion of DNA encoding 18 amino acids (corresponding to Pro-Glu-Dynorphin-Gly) into the E. coli phoA gene so that the additional amino acids appeared within a loop of the alkaline phosphatase protein resulted in a chimeric protein having both phoA and dynorphin activity (FREI90). Thus, changing the surface of a binding protein may alter its specificity without abolishing binding activity.
D. Techniques Of Mutagenesis
Early techniques of mutating proteins involved manipulations at the amino acid sequence level. In the semisynthetic method (TSCH87), the protein was cleaved into two fragments, a residue removed from the new end of one fragment, the substitute residue added on in its place, and the modified fragment joined with the other, original fragment. Alternatively, the mutant protein could be synthesized in its entirety (TANK77).
Erickson et al. suggested that mixed amino acid reagents could be used to produce a family of sequence-related proteins which could then be screened by affinity chromatography (ERIC86). They envision successive rounds of mixed synthesis of variant proteins and purification by specific binding. They do not discuss how residues should be chosen for variation. Because proteins cannot be amplified, the researchers must sequence the recovered protein to learn which substitutions improve binding. The researchers must limit the level of diversity so that each variety of protein will be present in sufficient quantity for the isolated fraction to be sequenced.
With the development of recombinant DNA techniques, it became possible to obtain a mutant protein by mutating the gene encoding the native protein and then expressing the mutated gene. Several mutagenesis strategies are known. One, "protein surgery" (DILL87), involves the introduction of one or more predetermined mutations within the gene of choice. A single polypeptide of completely predetermined sequence is expressed, and its binding characteristics are evaluated.
At the other extreme is random mutagenesis by means of relatively nonspecific mutagens such as radiation and various chemical agents. See Ho et al. (HOCJ85) and Lehtovaara, E.P. Appln. 285,123.
It is possible to randomly vary predetermined nucleotides using a mixture of bases in the appropriate cycles of a nucleic acid synthesis procedure. The proportion of bases in the mixture, for each position of a codon, will determine the frequency at which each amino acid will occur in the polypeptides expressed from the degenerate DNA population. Oliphant et al. (OLIP86) and Oliphant and Struhl (OLIP87) have demonstrated ligation and cloning of highly degenerate oligonucleotides, which were used in the mutation of promoters. They suggested that similar methods could be used in the variation of protein coding regions. They do not say how one should: a) choose protein residues to vary, or b) select or screen mutants with desirable properties. Reidhaar-Olson and Sauer (REID88a) have used synthetic degenerate oligo-nts to vary simultaneously two or three residues through all twenty amino acids. See also Vershon et al. (VERS86a; VERS86b). Reidhaar-Olson and Sauer do not discuss the limits on how many residues could be varied at once nor do they mention the problem of unequal abundance of DNA encoding different amino acids. They looked for proteins that either had wild-type dimerization or that did not dimerize. They did not seek proteins having novel binding properties and did not find any. This approach is likewise limited by the number of colonies that can be examined (ROBE86).
To the extent that this prior work assumes that it is desirable to adjust the level of mutation so that there is one mutation per protein, it should be noted that many desirable protein alterations require multiple amino acid substitutions and thus are not accessible through single base changes or even through all possible amino acid substitutions at any one residue.
D. Affinity Chromatography of Cells
Ferenci and coloborators have published a series of papers on the chromatographic isolation of mutants of the maltose-transport protein LamB of E. coli (FERE82a, FERE82b, FERE83, FERE84, CLUN84, HEIN87 and papers cited therein). The mutants were either spontaneous or induced with nonspecific chemical mutagens. Levels of mutagenesis were picked to provide single point mutations or single insertions of two residues. No multiple mutations were sought or found.
While variation was seen in the degree of affinity for the conventional LamB substrates maltose and starch, there was no selection for affinity to a target molecule not bound at all by native LamB, and no multiple mutations were sought or found. FERE84 speculated that the affinity chromatographic selection technique could be adapted to development of similar mutants of other "important bacterial surface-located enzymes", and to selecting for mutations which result in the relocation of an intracellular bacterial protein to the cell surface. Ferenci's mutant surface proteins would not, however, have been chimeras of a bacterial surface protein and an exogenous or heterologous binding domain.
Ferenci also taught that there was no need to clone the structural gene, or to know the protein structure, active site, or sequence. The method of the present invention, however, specifically utilizes a cloned structural gene. It is not possible to construct and express a chimeric, outer surface-directed potential binding protein-encoding gene without cloning.
Ferenci did not limit the mutations to particular loci or particular substitutions. In the present invention, knowledge of the protein structure, active site and/or sequence is used as appropriate to predict which residues are most likely to affect binding activity without unduly destabilizing the protein, and the mutagenesis is focused upon those sites. Ferenci does not suggest that surface residues should be preferentially varied. In consequence, Ferenci's selection system is much less efficient than that disclosed herein.
E. Bacterial and Viral Expression of Chimeric Surface Proteins
A number of researchers have directed unmutated foreign antigenic epitopes to the surface of bacteria or phage, fused to a native bacterial or phage surface protein, and demonstrated that the epitopes were recognized by antibodies. Thus, Charbit, et al. (CHAR86) genetically inserted the C3 epitope of the VP1 coat protein of poliovirus into the LamB outer membrane protein of E. coli, and determined immunologically that the C3 epitope was exposed on the bacterial cell surface. Charbit, et al. (CHAR87) likewise produced chimeras of LamB and the A (or B) epitopes of the preS2 region of hepatitis B virus.
A chimeric LacZ/OmpB protein has been expressed in E. coli and is, depending on the fusion, directed to either the outer membrane or the periplasm (SILH77). A chimeric LacZ/OmpA surface protein has also been expressed and displayed on the surface of E. coli cells (Weinstock et al., WEIN83). Others have expressed and displayed on the surface of a cell chimeras of other bacterial surface proteins, such as E. coli type 1 fimbriae (Hedegaard and Klemm (HEDE89)) and Bacterioides nodusus type 1 fimbriae (Jennings et al., JENN89). In none of the recited cases was the inserted genetic material mutagenized.
Dulbecco (DULB86) suggests a procedure for incorporating a foreign antigenic epitope into a viral surface protein so that the expressed chimeric protein is displayed on the surface of the virus in a manner such that the foreign epitope is accessible to antibody. In 1985 Smith (SMIT85) reported inserting a nonfunctional segment of the EcoRI endonuclease gene into gene III of bacteriophage f1, "in phase". The gene III protein is a minor coat protein necessary for infectivity. Smith demonstrated that the recombinant phage were adsorbed by immobilized antibody raised against the EcoRI endonuclease, and could be eluted with acid. De la Cruz et al. (DELA88) have expressed a fragment of the repeat region of the circumsporozoite protein from Plasmodium falciparum on the surface of M13 as an insert in the gene III protein. They showed that the recombinant phage were both antigenic and immunogenic in rabbits, and that such recombinant phage could be used for B epitope mapping. The researchers suggest that similar recombinant phage could be used for T epitope mapping and for vaccine development.
None of these researchers suggested mutagenesis of the inserted material, nor is the inserted material a complete binding domain conferring on the chimeric protein the ability to bind specifically to a receptor other than the antigen combining site of an antibody.
McCafferty et al. (MCCA90) expressed a fusion of an Fv fragment of an antibody to the N-terminal of the pIII protein. The Fv fragment was not mutated.
F. Epitope Libraries on Fusion Phage
Parmley and Smith (PARM88) suggested that an epitope library that exhibits all possible hexapeptides could be constructed and used to isolate epitopes that bind to antibodies. In discussing the epitope library, the authors did not suggest that it was desirable to balance the representation of different amino acids. Nor did they teach that the insert should encode a complete domain of the exogenous protein. Epitopes are considered to be unstructured peptides as opposed to structured proteins.
After the filing of the parent application whose benefit is claimed herein under 35 U.S.C. 120, certain groups reported the construction of "epitope libraries." Scott and Smith (SCOT90) and Cwirla et al. (CWIR90) prepared "epitope libraries" in which potential hexapeptide epitopes for a target antibody were randomly mutated by fusing degenerate oligonucleotides, encoding the epitopes, with gene III of fd phage, and expressing the fused gene in phage-infected cells. The cells manufactured fusion phage which displayed the epitopes on their surface; the phage which bound to immobilized antibody were eluted with acid and studied. In both cases, the fused gene featured a segment encoding a spacer region to separate the variable region from the wild type pIII sequence so that the varied amino acids would not be constrained by the nearby pIII sequence. Devlin et al. (DEVL90) similarly screened, using M13 phage, for random 15 residue epitopes recognized by streptavidin. Again, a spacer was used to move the random peptides away from the rest of the chimeric phage protein. These references therefore taught away from constraining the conformational repertoire of the mutated residues.
Another problem with the Scott and Smith, Cwirla et al., and Devlin et al., libraries was that they provided a highly biased sampling of the possible amino acids at each position. Their primary concern in designing the degenerate oligonucleotide encoding their variable region was to ensure that all twenty amino acids were encodible at each position; a secondary consideration was minimizing the frequency of occurrence of stop signals. Consequently, Scott and Smith and Cwirla et al. employed NNK (N=equal mixture of G, A, T, C; K=equal mixture of G and T) while Devlin et al. used NNS (S=equal mixture of G and C). There was no attempt to minimize the frequency ratio of most favored-to-least favored amino acid, or to equalize the rate of occurrence of acidic and basic amino acids.
Devlin et al. characterized several affinity-selected streptavidin-binding peptides, but did not measure the affinity constants for these peptides. Cwirla et al. did determine the affinity constant for his peptides, but were disappointed to find that his best hexapeptides had affinities (350-300 nM), "orders of magnitude" weaker than that of the native Met-enkephalin epitope (7 nM) recognized by the target antibody. Cwirla et al. speculated that phage bearing peptides with higher affinities remained bound under acidic elution, possibly because of multivalent interactions between phage (carrying about 4 copies of pIII) and the divalent target IgG. Scott and Smith were able to find peptides whose affinity for the target antibody (A2) was comparable to that of the reference myohemerythrin epitope (50 nM). However, Scott and Smith likewise expressed concern that some high-affinity peptides were lost, possibly through irreversible binding of fusion phage to target. G. Non-Commonly Owned Patents and Applications Naming Robert Ladner as an Inventor
Ladner, U.S. Pat. No. 4,704,692, "Computer Based System and Method for Determining and Displaying Possible Chemical Structures for Converting Double- or Multiple-Chain Polypeptides to Single-Chain Polypeptides" describes a design method for converting proteins composed of two or more chains into proteins of fewer polypeptide chains, but with essentially the same 3D structure. There is no mention of variegated DNA and no genetic selection. Ladner and Bird, WO88/01649 (Publ. Mar. 10, 1988) disclose the specific application of computerized design of linker peptides to the preparation of single chain antibodies.
Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sep. 1988 and having priority from U.S. application Ser. No. 07/021,046, assigned to Genex Corp.) (LGB) speculate that diverse single chain antibody domains (SCAD) may be screened for binding to a particular antigen by varying the DNA encoding the combining determining regions of a single chain antibody, subcloning the SCAD gene into the gpV gene of phage lambda so that a SCAD/gpV chimera is displayed on the outer surface of phage lambda, and selecting phage which bind to the antigen through affinity chromatography. The only antigen mentioned is bovine growth hormone. No other binding molecules, targets, carrier organisms, or outer surface proteins are discussed. Nor is there any mention of the method or degree of mutagenesis. Furthermore, there is no teaching as to the exact structure of the fusion nor of how to identify a successful fusion or how to proceed if the SCAD is not displayed.
Ladner and Bird, WO88/06601 (publ. 7 Sep. 1988) suggest that single chain "pseudodimeric" repressors (DNA-binding proteins) may be prepared by mutating a putative linker peptide followed by in vivo selection that mutation and selection may be used to create a dictionary of recognition elements for use in the design of asymmetric repressors. The repressors are not displayed on the outer surface of an organism.
Methods of identifying residues in protein which can be replaced with a cysteine in order to promote the formation of a protein-stabilizing disulfide bond are given in Pantoliano and Ladner, U.S. Pat. No. 4,903,773 (PANT90), Pantoliano and Ladner (PANT87), Pabo and Suchenek (PABO86), MATS89, and SAUE86.
No admission is made that any cited reference is prior art or pertinent prior art, and the dates given are those appearing on the reference and may not be identical to the actual publication date. All references cited in this specification are hereby incorporated by reference.