The present invention relates generally to the field of the production and selection of binding and catalytic polypeptides by the methods of molecular biology, using both combinatorial chemistry and recombinant DNA. The invention specifically relates to the generation of both nucleic acid and polypeptide libraries derived therefrom encoding the molecular scaffolding of Fibronectin Type III (Fn3) modified in one or more of its loop regions. The invention also relates to the xe2x80x9cartificial mini-antibodiesxe2x80x9d or xe2x80x9cmonobodies,xe2x80x9d i.e., the polypeptides comprising an Fn3 scaffold onto which loop regions capable of binding to a variety of different molecular structures (such as antibody binding sites) have been grafted.
Antibody Structure
A standard antibody (Ab) is a tetrameric structure consisting of two identical immunoglobulin (Ig) heavy chains and two identical light chains. The heavy and light chains of an Ab consist of different domains. Each light chain has one variable domain (VL) and one constant domain (CL), while each heavy chain has one variable domain (VH) and three or four constant domains (CH) (Alzari et al., 1988). Each domain, consisting of xcx9c110 amino acid residues, is folded into a characteristic xcex2-sandwich structure formed from two xcex2-sheets packed against each other, the immunoglobulin fold. The VH and VL domains each have three complementarity determining regions (CDR1-3) that are loops, or turns, connecting xcex2-strands at one end of the domains (FIG. 1: A, C). The variable regions of both the light and heavy chains generally contribute to antigen specificity, although the contribution of the individual chains to specificity is not always equal. Antibody molecules have evolved to bind to a large number of molecules by using six randomized loops (CDRs). However, the size of the antibodies and the complexity of six loops represents a major design hurdle if the end result is to be a relatively small peptide ligand.
Antibody Substructures
Functional substructures of Abs can be prepared by proteolysis and by recombinant methods. They include the Fab fragment, which comprises the VH-CH1 domains of the heavy chain and the VL-CL1 domains of the light chain joined by a single interchain disulfide bond, and the Fv fragment, which comprises only the VH and VL domains. In some cases, a single VH domain retains significant affinity (Ward et al., 1989). It has also been shown that a certain monomeric xcexa light chain will specifically bind to its cognate antigen. (L. Masat et al., 1994). Separated light or heavy chains have sometimes been found to retain some antigen-binding activity (Ward et al., 1989). These antibody fragments are not suitable for structural analysis using NMR spectroscopy due to their size, low solubility or low conformational stability.
Another functional substructure is a single chain Fv (scFv), comprised of the variable regions of the immunoglobulin heavy and light chain, covalently connected by a peptide linker (S-z Hu et al., 1996). These small (M, 25,000) proteins generally retain specificity and affinity for antigen in a single polypeptide and can provide a convenient building block for larger, antigen-specific molecules. Several groups have reported biodistribution studies in xenografted athymic mice using scFv reactive against a variety of tumor antigens, in which specific tumor localization has been observed. However, the short persistence of scFvs in the circulation limits the exposure of tumor cells to the scFvs, placing limits on the level of uptake. As a result, tumor uptake by scFvs in animal studies has generally been only 1-5% ID/g as opposed to intact antibodies that can localize in tumors ad 30-40% ID/g and have reached levels as high as 60-70% ID/g.
A small protein scaffold called a xe2x80x9cminibodyxe2x80x9d was designed using a part of the Ig VH domain as the template (Pessi et al., 1993). Minibodies with high affinity (dissociation constant (Kd)xcx9c10xe2x88x927 M) to interleukin-6 were identified by randomizing loops corresponding to CDR1 and CDR2 of VH and then selecting mutants using the phage display method (Martin et al., 1994). These experiments demonstrated that the essence of the Ab function could be transferred to a smaller system. However, the minibody had inherited the limited solubility of the VH domain (Bianchi et al., 1994).
It has been reported that camels (Camelus dromedarius) often lack variable light chain domains when IgG-like material from their serum is analyzed, suggesting that sufficient antibody specificity and affinity can be derived form VH domains (three CDR loops) alone. Davies and Riechmann recently demonstrated that xe2x80x9ccamelizedxe2x80x9d VH domains with high affinity (Kdxcx9c10xe2x88x927 M) and high specificity can be generated by randomizing only the CDR3. To improve the solubility and suppress nonspecific binding, three mutations were introduced to the framework region (Davies and Riechmann, 1995). It has not been definitively shown, however, that camelization can be used, in general, to improve the solubility and stability of VHs.
An alternative to the xe2x80x9cminibodyxe2x80x9d is the xe2x80x9cdiabody.xe2x80x9d Diabodies are small bivalent and bispecific antibody fragments, i.e., they have two antigen-binding sites. The fragments comprise a heavy-chain variable domain (VH) connected to a light-chain variable domain (VL) on the same polypeptide chain (VH-VL). Diabodies are similar in size to an Fab fragment. By using a linker that is too short to allow pairing between the two domains on the same chain, the domains are forced to pair with the complementary domains of another chain and create two antigen-binding sites. These dimeric antibody fragments, or xe2x80x9cdiabodies,xe2x80x9d are bivalent and bispecific. P. Holliger et al., PNAS 90:6444-6448 (1993).
Since the development of the monoclonal antibody technology, a large number of 3D structures of Ab fragments in the complexed and/or free states have been solved by X-ray crystallography (Webster et al., 1994; Wilson and Stanfield, 1994). Analysis of Ab structures has revealed that five out of the six CDRs have limited numbers of peptide backbone conformations, thereby permitting one to predict the backbone conformation of CDRs using the so-called canonical structures (Lesk and Tramontano, 1992; Rees et al., 1994). The analysis also has revealed that the CDR3 of the VH domain (VH-CDR3) usually has the largest contact surface and that its conformation is too diverse for canonical structures to be defined; VH-CDR3 is also known to have a large variation in length (Wu et al., 1993). Therefore, the structures of crucial regions of the Ab-antigen interface still need to be experimentally determined.
Comparison of crystal structures between the free and complexed states has revealed several types of conformational rearrangements. They include side-chain rearrangements, segmental movements, large rearrangements of VH-CDR3 and changes in the relative position of the VH and VL domains (Wilson and Stanfield, 1993). In the free state, CDRs, in particular those which undergo large conformational changes upon binding, are expected to be flexible. Since X-ray crystallography is not suited for characterizing flexible parts of molecules, structural studies in the solution state have not been possible to provide dynamic pictures of the conformation of antigen-binding sites.
Mimicking the Antibody-binding Site
CDR peptides and organic CDR mimetics have been made (Dougall et al., 1994). CDR peptides are short, typically cyclic, peptides which correspond to the amino acid sequences of CDR loops of antibodies. CDR loops are responsible for antibody-antigen interactions. Organic CDR mimetics are peptides corresponding to CDR loops which are attached to a scaffold, e.g., a small organic compound.
CDR peptides and organic CDR mimetics have been shown to retain some binding affinity (Smyth and von Itzstein, 1994). However, as expected, they are too small and too flexible to maintain full affinity and specificity. Mouse CDRs have been grafted onto the human Ig framework without the loss of affinity (Jones et al., 1986; Riechmann et al., 1988), though this xe2x80x9chumanizationxe2x80x9d does not solve the above-mentioned problems specific to solution studies.
Mimicking Natural Selection Processes of Abs
In the immune system, specific Abs are selected and amplified from a large library (affinity maturation). The processes can be reproduced in vitro using combinatorial library technologies. The successful display of Ab fragments on the surface of bacteriophage has made it possible to generate and screen a vast number of CDR mutations (McCafferty et al., 1990; Barbas et al., 1991; Winter et al., 1994). An increasing number of Fabs and Fvs (and their derivatives) is produced by this technique, providing a rich source for structural studies. The combinatorial technique can be combined with Ab mimics.
A number of protein domains that could potentially serve as protein scaffolds have been expressed as fusions with phage capsid proteins. Review in Clackson and Wells, Trends Biotechnol. 12:173-184 (1994). Indeed, several of these protein domains have already been used as scaffolds for displaying random peptide sequences, including bovine pancreatic trypsin inhibitor (Roberts et al., PNAS 89:2429-2433 (1992)), human growth hormone (Lowman et al., Biochemistry 30:10832-10838 (1991)), Venturini et al., Protein Peptide Letters 1:70-75 (1994)), and the IgG binding domain of Streptococcus (O""Neil et al., Techniques in Protein Chemistry V (Crabb, L,. ed.) pp. 517-524, Academic Press, San Diego (1994)). These scaffolds have displayed a single randomized loop or region.
Researchers have used the small 74 amino acid xcex1-amylase inhibitor Tendamistat as a presentation scaffold on the filamentous phage M13 (McConnell and Hoess, 1995). Tendamistat is a xcex2-sheet protein from Streptomyces tendae. It has a number of features that make it an attractive scaffold for peptides, including its small size, stability, and the availability of high resolution NMR and X-ray structural data. Tendamistat""s overall topology is similar to that of an immunoglobulin domain, with two xcex2-sheets connected by a series of loops. In contrast to immunoglobulin domains, the xcex2-sheets of Tendamistat are held together with two rather than one disulfide bond, accounting for the considerable stability of the protein. By analogy with the CDR loops found in immunoglobulins, the loops the Tendamistat may serve a similar function and can be easily randomized by in vitro mutagenesis.
Tendamistat, however, is derived from Streptomyces tendae. Thus, while Tendamistat may be antigenic in humans, its small size may reduce or inhibit its antigenicity. Also, Tendamistat""s stability is uncertain. Further, the stability that is reported for Tendamistat is attributed to the presence of two disulfide bonds. Disulfide bonds, however, are a significant disadvantage to such molecules in that they can be broken under reducing conditions and must be properly formed in order to have a useful protein structure. Further, the size of the loops in Tendamistat are relatively small, thus limiting the size of the inserts that can be accommodated in the scaffold. Moreover, it is well known that forming correct disulfide bonds in newly synthesized peptides is not straightforward. When a protein is expressed in the cytoplasmic space of E. coli, the most common host bacterium for protein overexpression, disulfide bonds are usually not formed, potentially making it difficult to prepare large quantities of engineered molecules.
Thus, there is an on-going need for small, single-chain artificial antibodies for a variety of therapeutic, diagnostic and catalytic applications.
The invention provides a fibronectin type III (Fn3) polypeptide monobody comprising a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences. One or more of the monobody loop region sequences of the Fn3 polypeptide vary by deletion, insertion or replacement of at least two amino acids from the corresponding loop region sequences in wild-type Fn3. The xcex2-strand domains of the monobody have at least about 50% total amino acid sequence homology to the corresponding amino acid sequence of wild-type Fn3""s xcex2-strand domain sequences. Preferably, one or more of the loop regions of the monobody comprise amino acid residues:
i) from 15 to 16 inclusive in an AB loop;
ii) from 22 to 30 inclusive in a BC loop;
iii) from 39 to 45 inclusive in a CD loop;
iv) from 51 to 55 inclusive in a DE loop;
v) from 60 to 66 inclusive in an EF loop; and
vi) from 76 to 87 inclusive in an FG loop.
The invention also provides a nucleic acid molecule encoding a Fn3 polypeptide monobody of the invention, as well as an expression vector comprising said nucleic acid molecule and a host cell comprising said vector.
The invention further provides a method of preparing a Fn3 polypeptide monobody. The method comprises providing a DNA sequence encoding a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein at least one loop region of said sequence contains a unique restriction enzyme site. The DNA sequence is cleaved at the unique restriction site. Then a preselected DNA segment is inserted into the restriction site. The preselected DNA segment encodes a peptide capable of binding to a specific binding partner (SBP) or a transition state analog compound (TSAC). The insertion of the preselected DNA segment into the DNA sequence yields a DNA molecule which encodes a polypeptide monobody having an insertion. The DNA molecule is then expressed so as to yield the polypeptide monobody.
Also provided is a method of preparing a Fn3 polypeptide monobody, which method comprises providing a replicatable DNA sequence encoding a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein the nucleotide sequence of at least one loop region is known. Polymerase chain reaction (PCR) primers are provided or prepared which are sufficiently complementary to the known loop sequence so as to be hybridizable under PCR conditions, wherein at least one of the primers contains a modified nucleic acid sequence to be inserted into the DNA sequence. PCR is performed using the replicatable DNA sequence and the primers. The reaction product of the PCR is then expressed so as to yield a polypeptide monobody.
The invention further provides a method of preparing a Fn3 polypeptide monobody. The method comprises providing a replicatable DNA sequence encoding a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein the nucleotide sequence of at least one loop region is known. Site-directed mutagenesis of at least one loop region is performed so as to create an insertion mutation. The resultant DNA comprising the insertion mutation is then expressed.
Further provided is a variegated nucleic acid library encoding Fn3 polypeptide monobodies comprising a plurality of nucleic acid species encoding a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein one or more of the monobody loop region sequences vary by deletion, insertion or replacement of at least two amino acids from corresponding loop region sequences in wild-type Fn3, and wherein the xcex2-strand domains of the monobody have at least a 50% total amino acid sequence homology to the corresponding amino acid sequence of xcex2-strand domain sequences of the wild-type Fn3. The invention also provides a peptide display library derived from the variegated nucleic acid library of the invention. Preferably, the peptide of the peptide display library is displayed on the surface of a bacteriophage, e.g., a M13 bacteriophage or a fd bacteriophage, or virus.
The invention also provides a method of identifying the amino acid sequence of a polypeptide molecule capable of binding to a specific binding partner (SBP) so as to form a polypeptide:SSP complex, wherein the dissociation constant of the said polypeptide:SBP complex is less than 10xe2x88x926 moles/liter. The method comprises the steps of:
a) providing a peptide display library of the invention;
b) contacting the peptide display library of (a) with an immobilized or separable SBP;
c) separating the peptide:SBP complexes from the free peptides;
d) causing the replication of the separated peptides of (c) so as to result in a new peptide display library distinguished from that in (a) by having a lowered diversity and by being enriched in displayed peptides capable of binding the SBP;
e) optionally repeating steps (b), (c), and (d) with the new library of (d); and
f) determining the nucleic acid sequence of the region encoding the displayed peptide of a species from (d) and hence deducing the peptide sequence capable of binding to the SBP.
The present invention also provides a method of preparing a variegated nucleic acid library encoding Fn3 polypeptide monobodies having a plurality of nucleic acid species each comprising a plurality of loop regions, wherein the species encode a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein one or more of the loop region sequences vary by deletion, insertion or replacement of at least two amino acids from corresponding loop region sequences in wild-type Fn3, and wherein the xcex2-strand domain sequences of the monobody have at least a 50% total amino acid sequence homology to the corresponding amino acid sequences of xcex2-strand domain sequences of the wild-type Fn3, comprising the steps of
a) preparing an Fn3 polypeptide monobody having a predetermined sequence;
b) contacting the polypeptide with a specific binding partner (SBP) so as to form a polypeptide:SSP complex wherein the dissociation constant of the said polypeptide:SBP complex is less than 10xe2x88x926 moles/liter;
c) determining the binding structure of the polypeptide:SBP complex by nuclear magnetic resonance spectroscopy or X-ray crystallography; and
d) preparing the variegated nucleic acid library, wherein the variegation is performed at positions in the nucleic acid sequence which, from the information provided in (c), result in one or more polypeptides with improved binding to the SBP.
Also provided is a method of identifying the amino acid sequence of a polypeptide molecule capable of catalyzing a chemical reaction with a catalyzed rate constant, kcat, and an uncatalyzed rate constant, kuncat, such that the ratio of kcat/kuncat is greater than 10. The method comprises the steps of:
a) providing a peptide display library of the invention;
b) contacting the peptide display library of (a) with an immobilized or separable transition state analog compound (TSAC) representing the approximate molecular transition state of the chemical reaction;
c) separating the peptide:TSAC complexes from the free peptides;
d) causing the replication of the separated peptides of (c) so as to result in a new peptide display library distinguished from that in (a) by having a lowered diversity and by being enriched in displayed peptides capable of binding the TSAC;
e) optionally repeating steps (b), (c), and (d) with the new library of (d); and
f) determining the nucleic acid sequence of the region encoding the displayed peptide of a species from (d) and hence deducing the peptide sequence.
The invention also provides a method of preparing a variegated nucleic acid library encoding Fn3 polypeptide monobodies having a plurality of nucleic acid species each comprising a plurality of loop regions, wherein the species encode a plurality of Fn3 xcex2-strand domain sequences that are linked to a plurality of loop region sequences, wherein one or more of the loop region sequences vary by deletion, insertion or replacement of at least two amino acids from corresponding loop region sequences in wild-type Fn3, and wherein the xcex2-strand domain sequences of the monobody have at least a 50% total amino acid sequence homology to the corresponding amino acid sequences of xcex2-strand domain sequences of the wild-type Fn3, comprising the steps of
a) preparing an Fn3 polypeptide monobody having a predetermined sequence, wherein the polypeptide is capable of catalyzing a chemical reaction with a catalyzed rate constant, kcat, and an uncatalyzed rate constant, kuncat, such that the ratio of kcat/kuncat is greater than 10;
b) contacting the polypeptide with an immobilized or separable transition state analog compound (TSAC) representing the approximate molecular transition state of the chemical reaction;
c) determining the binding structure of the polypeptide:TSAC complex by nuclear magnetic resonance spectroscopy or X-ray crystallography; and
d) preparing the variegated nucleic acid library, wherein the variegation is performed at positions in the nucleic acid sequence which, from the information provided in (c), result in one or more polypeptides with improved binding to or stabilization of the TSAC.
The invention also provides a kit for the performance of any of the methods of the invention. The invention further provides a composition, e.g., a polypeptide, prepared by the use of the kit, or identified by any of the methods of the invention.
The following abbreviations have been used in describing amino acids, peptides, or proteins: Ala, or A, Alanine; Arg, or R, Arginine; Asn or N, asparagine; Asp, or D, aspartic acid; Cysor C, cystein; Gln, or Q, glutamine; Glu, or E, glutamic acid; Gly, or G, glycine; His, or H, histidine; IIe, or I, isoleucine; Leu, or L, leucine; Lys, or K, lysine; Met, or M, methionine; Phe, or F, phenylalanine; Pro, or P, proline; Ser, or S, serine; Thr, or T, threonine; Trp, or W, tryptophan; Tyr, or Y, tyrosine; Val, or V, valine.
The following abbreviations have been used in describing nucleic acids, DNA, or RNA: A, adenosine; T, thymidine; G, guanosine; C, cytosine.