The present invention relates to methods for analyzing, altering, and controling the structural basis for protein binding to target molecules. More particularly, the present invention is directed to peptide ladder libraries corresponding to a protein, protein fragment, or other bioactive peptide and to the use of peptide ladder libraries for obtaining a protein signature analysis.
One of the major strategies for determining the relationship between the chemical structure of a peptide and its biological activity is to systematically alter the covalent structure and observe the effect on function. Through the use of chemical synthesis, a wide variety of modifications can be made. For example, N-methylation and the use of ester bonds can probe backbone interactions (Arad et al. Biopolymers 1990, 29, 1633-1649; Bramson et al. J. Biol. Chem. 1985, 260, 15452-15457; Caporale et al. In: Peptides: Structure and Function, Proceedings of the Tenth American Peptide Symposium; Marshall, G. F. Ed. Escom:Leiden: The Netherlands, 1988, pp. 449-451), while sidechain contributions can be probed using D-amino acid or Alanine/Glycine substitutions (Konishi et al. In: Peptides: Structure and Function, Proceedings of the Tenth American Peptide Symposium, Marshall, G. F. Ed. Escom, Leiden: The Netherlands, 1988, pp. 479-481; Tam et al. In Peptides:Proceedings of the Eleventh American Peptide Symposium; Rivier, J. E.; Marshall, G. R. Ed.; Escom: Leiden, The Netherlands, 1990. pp 75-77). As traditionally practiced, a separate analogue must be prepared and assayed for each position in the peptide sequence that is to be studied.
An alternative, currently popular method of studying peptides is through combinatorial chemistry. This approach has had a major impact on the study of the molecular basis of peptide activity and has contributed to the search for new biologically active peptides (Thompson et al. Chem. Rev. 1996, 96, 555-600; Gordon et al. J. Med. Chem. 1994, 37, 1385-1401; Scott et al. Curr. Op. Biotech 1994, 5, 40-48) xe2x80x98Multiple Peptide Synthesisxe2x80x99 has extended the traditional approach by allowing peptides to be synthesized simultaneously (Geysen et al. J. Proc. Natl. Acad. Sci. USA 1984, 81, 3998-4001; Houghten et al. Proc Natl. Acad. Sci. USA 1985, 82, 5131-5134). The individual peptide products are spatially separated and can be analyzed either attached to a solid support or in solution. Established xe2x80x98split synthesisxe2x80x99 (Furka et al. Int. J. Pept. Prot. Res. 1991, 37, 487-494; Lam et al. Nature 1991, 354, 82-84) procedures allow for the rapid generation of huge numbers of peptide sequences through the repetition of a simple divide, couple and recombine process. The compositional diversity made possible by this approach is advantageous for the discovery of new xe2x80x98leadxe2x80x99 compounds since, in principle, all possible structural variants can be explored for the desired activity and only the few active oligomers of interest need to be individually identified (Furka et al. Int. J. Pept. Prot. Res. 1991, 37, 487-494; Lam et al. Nature 1991, 354, 82-84). However, where information about a complete set of functional and non-functional components is desired over many positions in a peptide sequence, such libraries are too complex to fully characterize and may have limited utility.
A more systematic investigation of the molecular basis of peptide function requires a different type of molecular diversity. Instead of a peptide mixture of high compositional diversity, it would be useful to construct an array of peptides which differ from each other in a precise and defined manner. In principle, one way to access this population would be as a minor fraction of a large, fully combinatorial library. For example, such an array of analogues could consist of all peptides which differ from a target sequence by a single amino acid substitution at each position in a peptide sequence (cf. xe2x80x98Ala scansxe2x80x99). By removing this defined subset of analogues from the context of a complex, fully combinatorial mixture of peptides, handling and analysis would be greatly simplified and a more useful profile of the effects of substituting the amino acid throughout the peptide chain would be obtained. Current split resin methods do not allow for this type of control over the composition of a peptide library. (Furka et al. Int. J. Pept. Prot. Res. 1991, 37, 487-494; Lam et al. Nature 1991, 354, 82-84).
Typically, to investigate the molecular basis of protein function systematic modifications are made to the protein structure and the effects of those modifications on the properties of the protein are evaluated. Site-directed mutagenesis (Smith et al. Angew. Chem. Int. Ed. Engl. 1994, 33, 1214-1220) has been the principle tool used to implement this approach and has given many insights into the contribution of individual sidechains to protein function. In particular, xe2x80x98alanine scanningxe2x80x99 (Wells et al. Methods in Enzymology 1991, 202, 390-411) has been used to identify specific amino acid sidechains involved in ligand binding interactions. This technique involves the sequential substitution of native amino acids by individual alanine residues which are regarded as functionally and structurally neutral. To extend the repertoire of modifications beyond the twenty genetically encoded amino acids, methods have been developed to substitute non-natural groups into proteins (Noren et al. Science. 1989. 244, 182-185). Although a variety of both novel sidechain and backbone modified proteins have been generated, there are apparent limits to the modifications possible using the methods of molecular biology and ribosomal synthesis (Ellman et al. Science 1991, 255, 197-200; Cornish et al. Angew Chem Int. Ed. Engl. 1995, 34, 621-633).
Recent advances in the total synthesis of polypeptides have opened the world of proteins to direct application of the tools of organic chemistry (Schnxc3x6lzer et al. Science 1992, 256, 221-225; Jackson et al. Science 1994, 266, 243-247; Dawson et al Science 1994, 266, 776-779; Canne et al. J. Am. Chem. Soc. 1995, 117, 2998-3007; Liu et al J. Am. Chem. Soc. 1995. 118, 307-312; Englebretsen et al. Tet. Lett. 1995, 36, 8871-8874). Using total chemical synthesis, a variety of protein analogues has been synthesized. Of particular note have been proteins containing xcex2-turn mimics (Baca et al. Prot. Sci. 1993, 2, 1085-1091), N-methylated amino acids (Rajarathnam et al. Science 1994, 264, 90-92), modified backbone atoms (Baca et al J. Am. Chem. Soc. 1995, 117, 1881-1887), and mirror image proteins composed entirely of D-amino acids (Zawadzke et al. J. Am. Chem. Soc. 1992, 114, 4002-4003; Milton et al. Science 1992, 256, 1445-1448; Fitzgerald et al. J. Am. Chem. Soc. 1995, 117, 11075-11080; Schumaacher et al. Science 1996, 271, 1854-1857). In addition, important insights into the mechanism of action of enzymes have been attained through the total chemical synthesis of unique analogues (Baca et al. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 11638-11642).
Although structure-function relationships in proteins can be studied using individual analogues prepared by either recombinant or chemical techniques, development of a profile of effects across the whole protein molecule is hindered by the time and effort required to generate and analyze multiple protein analogues (Matthews et al. Ann. Rev. Biochem. 1993, 62, 139-160). The use of combinatorial oligonucleotide synthesis in conjunction with protein expression in bacteria (Reidhaar-Olsen et al. Science 1988, 241, 53-57; Gregoret et al. Proc. Natl. Acad. Sci. USA. 1993. 90. 4246-4250) or on phage (Scott et al. Science 1990, 249, 386-390; Lowman, H. B. Bass, S. H.; Simpson, N.; Wells, J. A. Biochemistry 1991. 30 10832-10838) has provided a powerful method for studying large numbers of analogue proteins. These techniques allow pools of expressed proteins to be probed for a desired function. With appropriate screening procedures, a statistical sampling of numerous functional protein variants can be analyzed and identified (Gu et al. Protetin Science 1995, 4, 1108-1117). This strategy has proved to be powerful for generating variant proteins with new or optimized functions (Lowman et al. J. Moll. Biol. 1993, 234, 564-578; Rebar et al. Science 1994, 263, 671-673). However, studies designed to elucidate the molecular basis of protein function have been complicated by the necessarily incomplete characterization of the numerous protein analogues generated, and also by limitation to the naturally encoded amino acids.
In applying molecular diversity to the study protein function it would be useful to combine the valuable information gained by systematic modification through chemical synthesis with the advantages of combinatorial methods.
What is needed is an integrated approach to the preparation of a defined array of peptide and protein analogues in a single synthesis, their functional separation into active and inactive pools, and a simple one step readout of the composition of the self-encoded mixtures.
There are three aspects to the invention:
1. A combinatorial method for synthesizing a peptide ladder library corresponding to a protein, protein fragment, or other bioactive peptide.
2. A method for screening the peptide ladder library with respect to a binding function.
3. A method for identifying active or inactive components of the peptide ladder library, i.e. identification of a protein signature for the protein or protein fragment under investigation with respect to the function being probed.
A combinatorial synthetic method for making a peptide ladder library is illustrated in FIG. 1. The peptide ladder library is a one pot collection of xe2x80x9cnxe2x80x9d peptides, each peptide being identical to the others in the library with respect to molarity and structure except for the substitution of a marker at position xe2x80x9cnxe2x80x9d. The marker introduces a labile bond into the peptide backbone, e.g. a thioester bond, which can be selectively cleaved without cleaving other bonds within the peptide backbone. The marker also serves to introduce a ladder of stearic perturbations into the peptide backbone and/or to introduce a ladder of peptide side chain substitutions. The synthetic protocol employs a split synthesis method.
Conventional screening methods may be employed on the peptide ladder library to separate active components from inactive components within the library. An exemplary screening protocol is illustrated in FIG. 2.
After the screening is complete, the isolated components are analyzed as illustrated in FIG. 3 to obtain a molecular signature for the protein. Briefly, the isolated components are cleaved at their marker and analyzed. Mass spectrometry is the preferred method of analysis. However, alternative analytical methods include nmr (with deuterium exchange), ir, and FACS. Comparison of the analysis, e.g., ms, of the isolate with the control, i.e., an aliquot of the entire library, provides a molecular signature which identifies sites within the protein responsive or unresponsive to the screening method. For example, sites within the protein essential for binding or folding may be identified. The protein signature of the Crk-N/C3G interaction is illustrated in FIG. 3.
Successive iterations of the method of the invention can be employed to obtain a complete deconstructive analysis of a protein, even if the structure of the protein is unknown. The invention may be employed to characterize protein interactions and can facilitate the design of new therapeutics which are dependent upon such protein interaction.
One aspect of the invention is directed to a method for obtaining a molecular signature of a protein. The protein is of a type which has an amino acid sequence with length m, each amino acid position being represented by (aa)n where 1xe2x89xa6nxe2x89xa6m. The protein is also of a type which has a binding affinity with respect to a target molecule under binding conditions. The molecular signature then defined by a subsequence of the amino acid sequence of the protein. The subsequence is selected from amongst those positions (aa)n of the protein which, if individually replaced by a substitute amino acid, lead to a loss of binding affinity by the protein with respect to the target molecule.
The method employs a peptide ladder library. The peptide ladder library has m peptides. Each of the peptides is represented by (peptide)n, where 1xe2x89xa6nxe2x89xa6m. Each peptide has the same amino acid sequence as the protein except that position (aa)n of (peptide)n is replaced by a substitute amino acid. Preferred substitute amino acids include alanine and glycine. If only one substitute amino acid is employed, then the peptide has a footprint the size of one amino acid. In alternative embodiments, the footprint may include two or three substitute amino acids. The substitute amino acid at position (aa)n is linked to the amino acid at position (aa)n+1 by means of a labile bond. Preferred labile bonds are thioester bonds and ester bonds.
The peptide ladder library is then contacted with the target molecule under binding conditions in order to form bound peptides and unbound peptides. The bound peptides are bound to the target molecule; the unbound peptides are not. The unbound peptides are then separated from the bound peptides from said Step B in order to obtain separated unbound peptides. Each of the separated unbound peptide has the substitute amino acid only at position (aa)n which constitute the subsequence that define the molecular signature of the protein with respect to the target molecule. The labile bond of the separated unbound peptides are then cleaved in order to produce peptide cleavage products. Each peptide cleave product corresponds to one of the positions (aa)n from the subsequence which defines the molecular signature. The subsequence which defines the molecular signature of the protein is then constructed using the identity of the peptide cleavage products to identify the subsequence of amino acid positions that are essential for binding to the target molecule.
Alternative substitute amino acids include the following: L-alanine, L-arginine, L-aspartic acid, L-asparagine, L-cysteine, L-cystine, L-glutamic acid, L-glutamine, L-glycine, L-histidine, L-isoleucine, L-leucine, L-lysine, L-methionine, L-phenylalanine, L-proline, L-serine, L-threonine, L-tryptophan, L-tyrosine, L-valine, D-alanine, D-arginine, D-aspartic acid, D-asparagine, D-cysteine, D-cystine, D-glutamic acid, D-glutamine, D-glycine, D-histidine, D-isoleucine, D-leucine, D-lysine, D-methionine, D-phenylalanine, D-proline, D-serine, D-threonine, D-tryptophan, D-tyrosine, D-valine, L-xcex1-aminobutyric acid, D-xcex1-aminobutyric acid, L-xcex3-aminobutyric acid, D-xcex3-aminobutyric acid, L-xcex5-aminocaproic acid, D-xcex5-aminocaproic acid, L-homophenylalanine, D-homophenylalanine, L-alloisoleucine, D-alloisoleucine, L-xcex2-2-napthylalanine, D-xcex2-2-napthylalanine, L-norvaline, D-norvaline, L-ornithine, D-ornithine, L-pyridyl alanine, D-pyridyl alanine, L-2-thienylalanine, D-2-thienylalanine L-methyltyrosine, D-methyltyrosine, L-citrulline, D-citrulline, L-homocitrulline, and D-homocitrulline.
In an alternative mode, the molecular signature of the protein is determined as described above except that the analysis is performed on the bound peptides are separated from the unbound peptides. Each of the separated bound peptides lacks any substitute amino acid at position (aa)n from the subsequence which defines the molecular signature of the protein. The labile bonds of the separated bound peptides are then cleaved to form peptide cleavage products. Each peptide cleave product corresponds to one of the positions (aa)n not included within the subsequence which defines the molecular signature. Accordingly, in this mode of the invention, after detecting and identifying each of the peptide cleavage products, the subsequence which defines the molecular signature of the protein with respect to the target molecule is constructed by identifying amino acid positions (aa)n which does not correspond to any of the peptide cleavage products.
Another aspect of the invention is directed to a peptide ladder library corresponding to a protein. The protein is of a type which has a binding affinity with respect to a target molecule under binding conditions. The protein is also of a type which has an amino acid sequence with length m where 1xe2x89xa6nxe2x89xa6m. Each amino acid position within the protein is represented by (aa)n. The peptide ladder library then comprises m peptides, each peptide being represented by (peptide)n, where 1xe2x89xa6nxe2x89xa6m. Each peptide within the library has the same amino acid sequence as the protein except that position (aa)n of (peptide)n is replaced by a substitute amino acid. The substitute amino acid at position (aa)n is linked to the amino acid at position (aa)n+1 by means of a labile bond. If only one substitute amino acid is employed, then the peptide has a footprint the size of one amino acid. In alternative embodiments, the footprint may include two or three substitute amino acids. Preferred labile bonds include thioesters and esters. Preferred substitute amino acids are alanine and glycine.
Another aspect of the invention is directed to a method for constructing a peptide ladder library corresponding to a protein. The protein is of a type which has an amino acid sequence with length m. Each amino acid position of the protein may be represented by (aa)n where 1xe2x89xa6nxe2x89xa6m. The peptide library includes m peptides. Each peptide may be represented by (peptide)n, where 1xe2x89xa6nxe2x89xa6m. Each peptide has the same amino acid sequence as the protein except that position (aa)n of (peptide)n is replaced by a substitute amino acid. The substitute amino acid at position (aa)n is linked to the amino acid at position (aa)n+1 by means of a labile bond. A first reaction vessel may be provided which contains a first pool of nascent peptides having a length of mxe2x88x92n. The amino acid sequence of the nascent peptides runs between n+1 and m of the protein. The nascent peptides are attached to a matrix material. A second reaction vessel may be provided which contains a first pool of nascent ladder peptides having a length of mxe2x88x92n. The amino acid sequence runs between n+1 and m of the protein except that each (nascent ladder peptide)p has the substitute amino acid at position (aa)p, where n+1xe2x89xa6pxe2x89xa6m. The nascent ladder peptides are attached to a matrix material. An aliquot of matrix material is then transferred from the first reaction vessel to a third reaction vessel. Elongation reactions are then performed in each of the three reaction vessels. The first pool of nascent peptides in the first reaction vessel is elongated by addition of the amino acid of position (aa)n to form a second pool of nascent peptides having a length of mxe2x88x92n+1; the aliquot of nascent peptides in the third reaction vessel is then elongated by addition of the substitute amino acid of position (aa)n by means of labile bond to form a nascent ladder (peptide)n having a length of mxe2x88x92n+1; and the first pool of nascent peptide ladders in the third reaction vessel is elongated by addition of the amino acid of position (aa)n to form a partial second pool of nascent peptide ladders having a length of mxe2x88x92n+1. After the elongation reactions are complete, the product of the third reaction vessel is transferred to the second reaction vessel to complete the second pool of nascent peptide ladders having a length of mxe2x88x92n+1. The above process may then be repeated until n=1 and the second reaction vessel contains the sought after peptide ladder library.