This invention relates to anchor libraries and to methods of using anchor libraries to identify peptide sequences that bind to a target molecule.
The identification of peptides which bind to target molecules which are involved in various physiological functions, can have significant implications for the diagnosis and/or treatment of various abnormal or diseased conditions. For example, a binding peptide might modulate the original activity of the target molecule and therefore be useful as a drug.
The use of standard libraries to identify peptide sequences which specifically bind to target molecules is generally limited to pre-existing natural sequences from the organism which is the source of the DNA. More recently, libraries have been described which have clones containing short synthetic random coding sequences. See, e.g., Scott and Smith, Science 249:386-390 (1990); Cwirla et al., Proc. Natl. Acad. Sci. USA 87:6378-6382 (1990); Devlin et al., Science 249:404-406 (1990). These libraries are mixtures of filamentous phage clones, each displaying a random peptide sequence on the virion surface. In these types of libraries, the random amino acids are contiguous. The size of the peptides that can be screened for binding peptides in such contiguous random amino acid libraries is limited, in that as the size of the peptides increases, at some point it is not feasible to adequately search such a library since there are too many clones required to cover all possible permutations of the random amino acids in the peptides.
It is an object of the invention to identify peptide sequences that bind to specific target molecules.
It is another object of the invention to identify amino acid residues in a peptide that are important contacts between the peptide and a target molecule.
It is another object of the invention to determine where amino acid residues in a peptide that are important contacts between the peptide and a target molecule, are best positioned within the peptide.
It is another object of the invention to use an anchor library in which the random amino acid residues of the library are not continuous, for identifying amino acid residues in a peptide that are important contacts between the peptide and a target molecule.
It is another object of the invention to use an anchor library in which the random amino acid residues of the library are distributed throughout a much larger peptide domain consisting of random glycine and/or alanine residues, for identifying amino acid residues in a peptide that are important contacts between the peptide and a target molecule.
It is another object of the invention to search large peptide phage display libraries of, e.g., 16 mers, for a reduced number of essential amino acid residue contacts, e.g., four, between the peptide and a target molecule.
It is another object of the invention to identify a consensus sequence of a defined number of amino acid residues in any configuration of spacer amino acids, that are important contacts between a peptide and a target molecule.
It is yet another object of the invention to use a known core binding sequence on a peptide which binds to a target molecule, and identify surrounding amino acid residues which are additional important contacts between the peptide and the target molecule.
Still another object of the invention is to identify cysteine residues on a peptide which can form disulfide bridges and thereby increase the binding affinity of the peptide with a target molecule.
According to the invention, an anchor library is provided. The anchor library comprises a collection of recombinant vectors, e.g., viruses, phage, e.g., filamentous phage, plasmids or cosmids. Each of the vectors has a nucleic acid sequence inserted in a gene, e.g., a coat protein gene, e.g., gene III or gene VIII, thioredoxin, staphnuclease, lac repressor, gal4 or an antibody. The nucleic acid sequence encodes a displayed peptide sequence, e.g., displayed on the surface of a virion, cell, spore or gene product, which comprises:
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4
wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively, c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 preferably is 0 to about 20, more preferably is 0 to about 10, even more preferably is 0 to about 6, or most preferably is 0 to about 4, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence. In certain embodiments, at least about 105 to about 108 permutations of all possible permutations of the displayed peptide sequence are present in the anchor library. In other embodiments, the library does not contain more than about 10%, or more than about 1%, or more than about 0.1%, of displayed peptide sequences different from the first mentioned displayed peptide sequences.
Another aspect of the invention is where each Y1, Y2 and Y3 is any specified amino acid or combination of specified amino acids, e.g., alanine or cysteine or a combination of alanine and cysteine; or glycine or cysteine or a combination of glycine and cysteine.
In certain embodiments, the displayed peptide sequence further has at least one core binding sequence which is preferably about 1 to about 20 amino acid residues in length, more preferably about 4 to about 10, and most preferably is 6. The core binding sequence can be in addition to, or a replacement for, other amino acids in the displayed peptide sequence. Variations include the presence of more than one core binding sequence in the displayed peptide sequence, where, e.g., the core binding sequences can be adjacent, or not adjacent, to each other, and where they can be, e.g., identical or not identical to each other.
In other embodiments, the displayed peptide sequence further has at least one constraint, e.g., a crosslink, e.g., a disulfide bond, e.g., from the presence of a cysteine residue; a stacking interaction; a positive or negative charge; hydrophobicity; hydrophilicity; a structural motif, e.g., a zinc finger formation, a leucine zipper, or a xcex2-turn structure, e.g., from the presence of the sequence asp gly or pro gly; or combinations thereof. Cysteine residues can be in addition to, or a replacement for, other amino acids in the displayed peptide sequence.
Another aspect of the invention is a method of making an anchor library. A collection of nucleic acid sequences is synthesized. The nucleic acid sequences are inserted into vectors to give recombinant vectors and the recombinant vectors are introduced into a host. The host having the recombinant vectors is propagated so as to result in a collection of recombinant vectors, each of which has a nucleic acid sequence from the collection of nucleic acid sequences which encodes a displayed peptide sequence comprising:
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4.
Another aspect of the invention is a method of using an anchor library to identify a peptide sequence that binds to a target. An anchor library having a collection of recombinant vectors is provided. Each of the recombinant vectors has a nucleic acid sequence which encodes a displayed peptide sequence comprising:
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4.
Expression and display of the peptide sequence is permitted. The anchor library is contacted with the target, e.g., proteinaceous or non-proteinaceous molecules, e.g., ligands, receptors, hormones, cytokines, antibodies, antigens, enzymes, enzyme substrates or viruses, under conditions in which the displayed peptide sequence binds to the target, and the displayed peptide sequence which binds to the target is identified, e.g., by sequencing the nucleic acid sequence on the recombinant vector which encodes for the displayed peptide sequence. Preferably, the identified displayed peptide sequence is synthesized.
The invention also provides for a peptide which is identified by use of an anchor library, in which the peptide is useful as a diagnostic or therapeutic product in that the peptide is able to bind to a target molecule which is involved in a physiological process.
Other aspects of the invention include, e.g., a collection of recombinant DNA molecules encoding peptide sequences having a plurality of different binding domains; a recombinant filamentous phage having a displayed peptide sequence with known binding properties and which is foreign to the filamentous phage; a recombinant vector having a nucleic acid sequence inserted in a gene, the nucleic acid sequence encoding a displayed peptide sequence having known binding properties; a recombinant nucleic acid molecule having a nucleic acid sequence inserted in a gene, the nucleic acid sequence encoding a displayed peptide sequence having known binding properties; and a recombinant protein having a displayed peptide sequence having known binding properties.
The above and other objects, features and advantages of the present invention will be better understood from the following specification.
This invention provides an anchor library. The anchor library comprises a collection of recombinant vectors, each of which has a nucleic acid sequence inserted in a gene. The nucleic acid sequence encodes a displayed peptide sequence which comprises
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4
wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively, c1, c2 and c3 amino acids residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence. In certain embodiments at least about 105 to about 108 permutations of all possible permutations of the displayed peptide sequence are present in the anchor library. In other embodiments, the library does not contain more than about 10%, or more than about 1%, or more than about 0.1% of displayed peptide sequences different from the first mentioned displayed peptide sequences.
By anchor library is meant a library in which the recombinant vectors have nucleic acid sequences which code for peptide sequences with random amino acids in which the random amino acids are not continuous. An anchor library is thus distinguishable from other random amino acid libraries in which all random amino acids in the peptide sequence of interest are contiguous. In anchor libraries, a given number of random amino acids are distributed throughout a larger peptide domain consisting of specifically designated amino acid residues. Anchor libraries are meant to include, e.g., external libraries, e.g., phage display libraries, and internal libraries, e.g., plasmid libraries. Chemical libraries can be anchor libraries.
Vectors are meant to include, e.g., phage, viruses, plasmids, cosmids, or any other suitable vector known to those skilled in the art. The vector has a gene, native or foreign, which is able to tolerate insertion of a foreign peptide into the gene product of the gene. By gene is meant an intact gene or fragment thereof. In the invention, the expressed gene product contains the inserted peptide.
For certain embodiments of this invention, e.g., where phage display libraries are employed, the preferred vectors are filamentous phage, though other vectors can be used. Filamentous phage are single stranded DNA phage having coat proteins. Preferably, the gene that the nucleic acid sequence is inserted into is a coat protein gene of the filamentous phage. Preferred coat proteins are gene III or gene VIII coat proteins. Insertion of a foreign peptide into a coat protein gene results in the display of the foreign peptide on the surface of the phage. Insertion into any other gene product in which the inserted peptide is displayed can also be used in this invention. Examples of filamentous phage vectors which can be used in this invention are fUSE vectors, e.g., fUSE1, fUSE2, fUSE3 and fUSE5, in which the insertion is just downstream of the pIII signal peptide. Smith and Scott, Methods in Enzymology 217:228-257 (1993).
In other embodiments, e.g., where internal libraries are employed, the preferred vectors are plasmids, though other vectors can be used. The gene that the nucleic acid is inserted into is a gene which also results in display of the inserted peptide sequence. The gene can encode for an exported or non-exported gene product. Preferred genes include, e.g., thioredoxin, staphnuclease, lac repressor, gal4 or an antibody.
By recombinant vector is meant a vector having a nucleic acid sequence which is not normally present in the vector. The nucleic acid sequence is inserted into a gene present on the vector. Insertion of a nucleic acid into a gene is meant to include insertion within the gene or immediately 5xe2x80x2 or 3xe2x80x2 to, respectively, the beginning or end of the gene, such that when expressed, a fusion gene product is made. The nucleic acid sequence that is inserted includes, e.g., a synthesized nucleic acid sequence or a fragment of another nucleic acid molecule. The nucleic acid sequence encodes a displayed peptide sequence. By displayed peptide sequence is meant a peptide sequence that is on the surface of, e.g., a virion, e.g. a phage or virus, a cell, a spore, or an expressed gene product. It is preferable to have the displayed peptide displayed such that it is able to bind to added target molecules. A displayed peptide sequence can be identical to, or not identical to, a naturally occurring peptide sequence.
The displayed peptide sequence can vary in size. As the size increases, the complexity of the anchor library increases, such that at some point a complete library is not obtainable. Complete libraries or incomplete libraries can be used in this invention. In certain embodiments, the complexity of the anchor library is at least about 108 to about 1011. Preferably, the complexity is at least about 109. It is preferred that the total size of the displayed peptide sequence (the random amino acids plus the spacer amino acids) should not be greater than about 100 amino acids long, more preferably not greater than about 50 amino acids long, and most preferably not greater than about 25 amino acids long. A particularly preferred library is made up of displayed peptides in which the longest of the peptides is 16 amino acids, i.e., a 16 mer library.
In large standard libraries, e.g., of 16 mers or greater, it is ordinarily not possible to search a library which contains all possible combinations of the 16 random amino acids. A major advantage of the anchor libraries of this invention is that these large libraries can be searched by looking for a reduced number of essential amino acid contacts between the peptides and the target. Preferably, the number of essential amino acid contacts should be sufficient to achieve micromolar binding. Preferably, the reduced number of essential contacts is about three to about ten, and most preferably it is about four. See Example 4. Thus, e.g., the number of combinations of four amino acid residue contacts in a 16 mer library is much less than the total number of combinations of all 16 amino acids in a 16 mer library, and therefore, this invention makes it possible to determine four important contact amino acids in a peptide of 16 amino acids in length, as opposed to standard screening of standard libraries in which such determinations cannot ordinarily be made.
In one embodiment of the invention, the displayed peptide sequence comprises
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4.
X1, X2, X3 and X4 are amino acid residues, each of which can be the same or different from any one of the others. Preferably, the amino acids are chosen from the 20 amino acids commonly found in naturally occurring proteins.
Y1, Y2 and Y3 can be any specified amino acid residue or combination of specified amino acid residues, and each of the Ys, if present, can be the same or different from any one of the others. Preferably, the amino acids are spacer amino acids which will not significantly interfere with the binding between the peptide sequence and a target molecule. It is preferable to use combinations of two or more amino acids for the Y amino acids in a given library so as to reduce any limitations in the conformations of the displayed peptide that might be imposed by use of only one given amino acid. Most preferably, glycine and alanine residues are used in combination in the library. Glycine and alanine are small side chain amino acids that appear to act more as blanks than interfering contacts. In other embodiments, the Y amino acids can be amino acids which are chosen because they do significantly affect in some way the binding between the peptide sequence and a target molecule. For example, glycine and cysteine residues can be used in combination, or alanine and cysteine residues can be used in combination.
Y1, Y2 and Y3, are, respectively c1, c2 and c3 amino acid residues long. c1, c2 and c3 can be the same or different from any one of the others. Preferably, each of c1, c2 and c3 is 0 to about 20, more preferably is 0 to about 10, even more preferably is 0 to about 6, and most preferably is 0 to about 4.
For example, in an anchor library where each of the c""s are 0 to 4, and the Y""s are a combination of glycine and alanine, the minimal structure of the peptide sequence is 4 amino acids long (where each of c1, c2 and c3 is 0):
X1X2X3X4,
and the maximal structure of the peptide sequence is 16 amino acids long (where each of c1, c2 and c3 is 4):
X1(G/A)(G/A)(G/A)(G/A)X2(G/A)(G/A)(G/A)(G/A)X3(G/A)(G/A)(G/A)(G/A)X4,
where (G/A) is a glycine or alanine residue. This anchor library also contains all other in-between permutations of c, e.g., where c1 is 0, c2 is 1 and c3 is 1; where c1 is 1, c2 is 1 and c3 is 1; where c1 is 2, c2 is 1 and c3 is 1; etc. All possible permutations of alanine and glycine for each of the designated c values are also included in this anchor library.
It is preferred that all possible permutations of the displayed sequence are present, that is, all combinations of c values and all combinations of, e.g., alanine and/or glycine, for each of the c values. In other embodiments, at least about 105 to about 108 permutations of all possible permutations are present in the anchor library, or at least about 104 permutations of all possible permutations are present in the anchor library, or at least about 105 permutations of all possible permutations are present in the anchor library, or at least about 106 permutations of all possible permutations are present in the anchor library, or at least about 107 permutations of all possible permutations are present in the anchor library, or at least about 108 permutations of all possible permutations are present in the anchor library, or at least about 109 permutations of all possible permutations are present in the anchor library.
In certain embodiments, the library does not contain more than about 10% of displayed peptide sequences different from the first mentioned displayed peptide sequences. In other embodiments, the library does not contain more than about 1% of displayed peptide sequences different from the first mentioned displayed peptide sequences. And in yet other embodiments, the library does not contain more than about 0.1% of displayed peptide sequences different from the first mentioned displayed peptide sequences.
In certain embodiments of the invention, the displayed peptide can have additional units of X(Y)c. For example, it can have preferably about 1 to about 10 additional units, more preferably about 1 to about 5 additional units, and most preferably about 1 to about 3 additional units. In other embodiments, one or more additional units of X alone or (Y)c alone can be present.
In yet other embodiments of the invention, the anchor libraries described above can have at least one core binding sequence, denoted by B, of p amino acid residues in length. B can be any size, e.g., from a single amino acid to the size of a gene. Preferably, p is about 1 to about 20, more preferably p is about 4 to about 10, and most preferably p is about 6. By core binding sequence is meant a peptide sequence which is known to bind to a target molecule. In certain embodiments, the core binding sequence is additional to the amino acid residues of the displayed peptide sequences described above. In such libraries, the core binding sequence can be positioned on the NH2-terminal or COOH-terminal side of any of the X1, X2, X3 or X4 amino acid residues, or on the NH2-terminal or COOH-terminal side of any of the Y, e.g., alanine or glycine, residues. In other embodiments, at least one of the X residues is replaced with the core binding sequence. In yet other embodiments, at least one of the Y residues, e.g., one of the alanine or glycine residues, is replaced with a core binding sequence. Inclusion of a known core binding sequence in the anchor library allows identification of surrounding amino acid residues which are additional important contacts between the peptide and the target molecule. The invention thus allows identification of better binding sequences by identifying additional amino acids surrounding the core binding sequence which in combination with the known core binding sequence exhibit enhanced binding as compared to the known core binding sequence alone.
In certain embodiments, more than one known binding sequence is present in each of the displayed peptide sequences of the anchor library. These multiple known binding sequences can be adjacent to, or not adjacent to, each other, and can be identical to, or not identical to, each other.
In certain embodiments, the anchor libraries have at least one constraint imposed upon the displayed peptide sequence. A constraint includes, e.g., a crosslink, a stacking interaction, a positive or negative charge, hydrophobicity, hydrophilicity, a structural motif and combinations thereof. In certain embodiments, more than one constraint is present in each of the displayed peptide sequences of the anchor library. These multiple constraints can be adjacent to, or not adjacent to, each other, and can be identical to, or not identical to, each other.
A crosslink includes, e.g., a disulfide bond. In certain embodiments, the displayed peptide has at least one cysteine residue. The cysteine residue can be, e.g., additional to the amino acid residues of the displayed peptide sequences described above. In such libraries, the cysteine residue can be positioned on the NH2-terminal or COOH-terminal side of any of the X1, X2, X3 or X4 amino acid residues, or on the NH2-terminal or COOH-terminal side of any of the Y, e.g., alanine or glycine, residues. In other embodiments, at least one of the X residues is a cysteine residue. In yet other embodiments, at least one of the Y residues, e.g., one of the alanine or glycine residues, is replaced with a cysteine residue. Multiple cysteines can be present in each of the peptides so as to form potential disulfide bonds within a random series. Disulfide bonds can be formed within the displayed peptide sequence itself or between the displayed peptide sequence and the target molecule.
A structural motif includes, e.g., a zinc finger formation, a leucine zipper, and a R-turn structure in the peptide. The sequences asp gly or pro gly are likely to induce xcex2-turns, either alone or in combination with, e.g., a disulfide bond.
In other embodiments, the anchor libraries can be constructed to have both a core binding sequence and a constraint, e.g., at least one cysteine residue. In one such embodiment, at least one of the X residues can be, e.g., either a cysteine or a glycine such that the displayed peptide sequence is:
(C/G)(Y1)c1(C/G)(Y2)c2B(C/G)(Y3)c3(C/G)
where (C/G) is a cysteine or glycine residue. In such a library, multiple cysteines are present so as to form potential disulfide bonds within a random series.
In yet other embodiments, the displayed peptide sequence comprises:
X1(Y1)c1X2(Y2)c2X3(Y3)c3X4
wherein each Y1, Y2 and Y3 is alanine or glycine or a core binding sequence B of p amino acid residues in length or a combination of alanine and glycine or alanine and B or glycine and B.
And in yet other embodiments, the displayed peptide sequence comprises:
Z1(Y1)c1Z2(Y2)c2Z3(Y3)c3Z4
wherein each Z1, Z2, Z3 and Z4 is an amino acid residue or a core binding sequence B of p amino acid residues in length and any of Z1, Z2, Z3 and Z4 can be the same or different from any one other, and wherein Z1 and Z4 are each attached to an amino acid residue that flanks the displayed peptide sequence.
Other embodiments include anchor libraries constructed with other configurations of combinations between X residues and/or Y residues and/or B sequences and/or cysteine residues and/or other constraints, as is obvious to those skilled in the art.
The invention also includes a method of making the anchor libraries described above. A collection of nucleic acid sequences is synthesized and inserted into vectors to give recombinant vectors. These recombinant vectors are introduced into a host. The host having the recombinant vectors is propagated so as to result in a collection of recombinant vectors, each of the recombinant vectors having a nucleic acid sequence from the collection of nucleic acid sequences which encodes a displayed peptide sequence. The peptide sequence is any of the peptide sequences discussed above, e.g., X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, with or without at least one core binding sequence, and with or without at least one constraint, e.g., a cysteine residue. In certain embodiments, at least about 105 to about 108 permutations, or about 104 permutations, or about 105 permutations, or about 106 permutations, or about 107 permutations, or about 108 permutations, or about 109 permutations, of all possible permutations of the displayed peptide sequence are present in the anchor library. In other embodiments, the library does not contain more than about 10%, or more than about 1%, or more than about 0.1%, of displayed peptide sequences different from the first mentioned displayed peptide sequences.
The nucleic acids that encode the anchor library can be obtained by any method which produces the requisite permuted nucleic acids. For example, a split synthesis procedure can be used. See, e.g., Cormack and Struhl, Science 262:244-248 (1993). Examples 1 and 3 describe examples of using split synthesis to make nucleic acid inserts for anchor libraries.
The invention further includes a method of using the anchor libraries described above to identify a peptide sequence that binds to a target. An anchor library having a collection of recombinant vectors, each of which has a nucleic acid sequence which encodes a displayed peptide sequence, is provided. The displayed peptide sequence can be any of the peptide sequences discussed above, e.g., X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, with or without at least one core binding sequence, and with or without at least one constraint, e.g., a cysteine residue. Expression and display of the peptide sequence is permitted. The anchor library is contacted with the target under conditions in which the displayed peptide sequence binds to the target, and the displayed peptide sequence which binds to the target is identified.
Target is meant to include any molecule with which the displayed peptide sequence will bind. Targets include, e.g., proteinaceous and non-proteinaceous molecules. Examples of targets are ligands, receptors, hormones, cytokines, antibodies, antigens, enzymes, enzyme substrates and viruses. In some cases, the binding peptide modulates the original activity of the target molecule, and therefore can be useful as a drug. The target includes, e.g., drug antagonists and agonists. The binding peptides can be used, e.g., for diagnostic or therapeutic applications.
The contacting step can be done by any method in which the displayed peptide sequence will bind, directly or indirectly, to the target. These methods include, e.g., screens and selections. Preferably, an affinity purification method is used. Affinity purification includes, e.g., biopanning. For example, a phage anchor library having displayed peptide sequences is mixed with biotinylated target, resulting in phage:biotinylated target complex if a displayed peptide sequence binds to the target. The mixture is added to a streptavidin coated substance, e.g., beads or a petri plate. The resulting biotin-streptavidin bond allows isolation of the phage carrying peptide sequences that bind to the target., It is preferable to do multiple rounds of biopanning to reduce background. See Example 2.
Identification of the displayed peptide sequence includes, e.g., determining the sequence of amino acids that comprise the peptide. Identification can be accomplished, e.g., by amplifying the recombinant vector which has the nucleic acid sequence which encodes for the displayed peptide sequence which binds to the target, and sequencing the nucleic acid sequence by standard procedures known in the art to determine the displayed peptide sequence which binds to the target. If desired, the peptide thus identified can be synthesized using standard procedures known in the art and further tested for its ability to bind to the target in vitro and/or in cell-based, and/or animal models. See Example 2.
In a given anchor library, the ability to determine essential amino acid contacts between the displayed peptide and a target molecule is aided by the ability to observe conserved amino acid residues in the different displayed peptides which are able to bind to the target. Conserved amino acid residues are meant to include different DNA codons for the same amino acid or different DNA codons for functionally similar amino acids. The consensus is determined by comparing the sequence of the individual clones obtained from a library screen. It is preferable that the library have sufficient complexity in order to observe such a consensus.
Also included in the invention is a peptide identified by use of any of the anchor libraries described above in which the peptide is useful as a diagnostic or therapeutic product in that the peptide is able to bind to a target molecule which is involved in a physiological process. For example, the target molecule can be a receptor involved in inflammation, e.g., IL-1, or in prostate cancer, e.g., GnRH; or the target molecule can be an enzyme, e.g., a protease, e.g., HIV protease. By binding to these or other target molecules that are involved in various abnormal conditions or diseases, the binding peptides of this invention modulate the original activity of the target molecule and are therefore useful as diagnostic or therapeutic products.
The invention also includes a library which has a collection of nucleic acid molecules encoding peptides having random amino acids, the improvement comprising a library in which the random amino acids are not continuous so that the amino acids in the peptide that are important contacts for interaction between the peptide and a target molecule can be identified.
The invention also includes a library having a collection of nucleic acid molecules encoding peptides having random amino acids, the improvement comprising nucleic acid molecules encoding alanine or glycine or a combination of alanine and glycine residues in varying numbers acting as spacers between the random amino acids so that amino acid residues in a peptide that are important contacts for interaction between the peptide and a target molecule can be identified.
The invention further provides a collection of recombinant DNA molecules encoding peptide sequences having a plurality of different binding domains. The peptide sequences comprise: X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the peptide sequence, and wherein at least about 105 to about 108 permutations, or about 104 permutations, or about 105 permutations, or about 106 permutations, or about 107 permutations, or about 108 permutations, or about 109 permutations, of all possible permutations of the peptide sequence are present in the collection. In other embodiments, the collection does not contain more than about 10%, or more than about 1%, or more than about 0.1%, of displayed peptide sequences different from the first mentioned displayed peptide sequences. In certain embodiments, the peptide sequences are displayed on the surface of a biological material, e.g., a virus, phage, cell, spore or gene product.
The invention also includes a recombinant filamentous phage having a displayed peptide sequence with known binding properties. The displayed peptide sequence is foreign to the filamentous phage. The displayed peptide sequence comprises: X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence, and wherein the displayed peptide sequence is able to bind to a target. In certain embodiments, at least one of Y1, Y2 and Y3 is at least about 20 amino acid residues long, preferably is at least about 10 amino acid residues long, more preferably is at least about 6 amino acid residues long, even more preferably is at least about 4 amino acid residues long, more preferably yet is at least about 3 amino acid residues long, more preferably yet is at least about 2 amino acid residues long, and most preferably is at least about 1 amino acid residue long.
The invention also includes a recombinant vector having a nucleic acid sequence inserted in a gene. The nucleic acid sequence encodes a displayed peptide sequence having known binding properties. The displayed peptide sequence comprises: X1(Y1)c1X2(Y2)c2X3 (Y3)c3X4, wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence, and wherein the displayed peptide sequence is able to bind to a target. In certain embodiments, at least one of Y1, Y2 and Y3 is at least about 20 amino acid residues long, preferably is at least about 10 amino acid residues long, more preferably is at least about 6 amino acid residues long, even more preferably is at least about 4 amino acid residues long, more preferably yet is at least about 3 amino acid residues long, more preferably yet is at least about 2 amino acid residues long, and most preferably is at least about 1 amino acid residue long.
The invention also includes a recombinant nucleic acid molecule having a nucleic acid sequence inserted in a gene. The nucleic acid sequence encodes a displayed peptide sequence having known binding properties. The displayed peptide sequence comprises: X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence, and wherein the displayed peptide sequence is able to bind to a target. In certain embodiments, at least one of Y1, Y2 and Y3 is at least about 20 amino acid residues long, preferably is at least about 10 amino acid residues long, more preferably is at least about 6 amino acid residues long, more preferably is at least about 4 amino acid residues long, more preferably yet is at least about 3 amino acid residues long, more preferably yet is at least about 2 amino acid residues long, and most preferably is at least about 1 amino acid residue long.
The invention further includes a recombinant protein having a displayed peptide sequence having known binding properties. The displayed peptide sequence comprises: X1(Y1)c1X2(Y2)c2X3(Y3)c3X4, wherein each X1, X2, X3 and X4 is an amino acid residue and any of X1, X2, X3 and X4 can be the same or different from any one other, wherein each Y1, Y2 and Y3 is alanine or glycine or a combination of alanine and glycine that is respectively c1, c2 and c3 amino acid residues long and any of Y1, Y2 and Y3 if present can be the same or different from any one other, wherein each of c1, c2 and c3 is 0 to about 20, wherein X1 and X4 are each attached to an amino acid residue that flanks the displayed peptide sequence, and wherein the displayed peptide sequence is able to bind to a target. In certain embodiments, at least one of Y1, Y2 and Y3 is at least about 20 amino acid residues long, preferably is at least about 10 amino acid residues long, more preferably is at least about 6 amino acid residues long, even more preferably is at least about 4 amino acid residues long, more preferably yet is at least about 3 amino acid residues long, more preferably yet is at least about 2 amino acid residues long, and most preferably is at least about 1 amino acid residue long.