All protein sequences, whether peptides, polypeptides, or proteins, are composed of a linear sequence of amino acids joined by peptide bonds. There are twenty naturally occurring amino acids, each bearing a chemically unique side chain. Determinants of polypeptide interactions, such as those between peptide segments in protein folding or between protein monomers, are encoded in the one-dimensional sequence of these twenty amino acid side chains. For purposes of this application, “peptides” are generally considered to be amino acid polymers of not more than 25 amino acids in length; “polypeptides” are generally considered to be polymers of between 25 and 50 amino acids; and “proteins” are generally considered to be polymers containing more than 50 amino acids. One of ordinary skill in the art would appreciate that some overlap among these ranges is expected, and minor deviations from these ranges does not in any way diminish the scope of the invention. The “naturally occurring amino acids” are those that are encoded for in the genetic code, and which are generally considered to be those found in all living species to date.
Net differences in the cumulative energetic contributions of several types of weak bonding mechanisms, totaling as little as ΔG=5–10 kcal/mol, determine selection and stabilization among conformations observed in protein folding, protein-protein interactions and the initial phases of substrate-enzyme and ligand-membrane receptor association. In particular, the minimization of ΔG through the formation of four general types of weak bonding mechanisms between amino acid side chains, in the range of ΔG≅2–7 kcal/mol, determines the arrangement of protein sequences in three-dimensional space, as well as the relative orientations of protein chain aggregates, in aqueous environments and at physiological temperatures. The thermal instability of the conformations supported by these low ΔG, reversible, weak-bonding mechanisms permits uncatalyzed, fast searches of configuration space for functionally optimal cooperative arrangements within and between polypeptide and protein monomers. The variety of weak bond capacities afforded by amino acid side chains determines the range of the amino acid sequences' physicochemical property transformations listed in this invention.
The weak bonds ordering polypeptides and proteins in three-dimensional space include hydrogen bonds, such as the main chain amino acid carbonyl and imino groups, which configure the right-turning α-helices and the parallel and antiparallel β-sheets. They also include the hydrogen and ionic bonds between amino acid side chains, such as the hydroxyl groups of serine and threonine, the acidic carboxyl groups of aspartate and glutamate, and the basic groups of lysine and arginine. In addition to being distinct with respect to the chemical group, these weak hydrogen and ionic bonding influences are also directionally specific, with bonding angles greater than 30° reducing their influence to negligible levels.
A third but nondirectional type of weak bonding interaction, induced by fluctuating charges within a distance of 1–3 Å, is called van der Waal forces. These interactions vary with the size and the extent of mutual geometric fit, but are in the range of 1–2 kcal/mol. These forces are barely greater than those due to the heat of molecular motion at room temperature (ΔG≅0.6–1.0 kcal/mol). However, in the specific cases of some antibody/antigen interactions and MHC protein/peptide interactions, which involve water-releasing tight fits between corresponding moieties in suitably shaped binding pockets, the ΔGs associated with van der Waals interactions have been estimated to be as high as 30 kcal/mol.
A fourth weak bonding mechanism, and the most energetically dominant force on three-dimensional polypeptide structure and protein-protein interactions, is termed the hydrophobic effect. The hydrophobic effect arises from the much stronger attraction that water molecules have for each other than for hydrocarbon groups or molecules. Each tetrahedrally-coordinated water molecule participates in strong, hydrogen-bonded, dipole/dipole interactions with other water molecules that are manifested in the properties of water such as its high surface tension, high latent heat and high boiling point. These physicochemical features of water molecules afford a large variety of possible atomic arrangements of water (as seen in the large number of different ice types) that in turn permit maximizing the entropy and minimizing the free energy of the aqueous solution. Spatially distributed (nondirectional) deformations in these hydrogen-bonded arrangements of water result from the intrusion of nonpolar, hydrophobic solutes. The introduction of such molecules into an aqueous solution results in the formation of volume-expanding hydration shells composed of hydrogen-bonded cages of multiple molecular layers of water (“clathrate structures”) around these molecules, in a process called “hydrophobic hydration”. In aqueous solutions, such deformations in water structure are energetically unfavored. For example, the side chains of alanine, valine, leucine and isoleucine are without effective dipole moments, and therefore cannot participate in charge-mediated or hydrogen-bonding interactions with water. As a result, these side chains intrude into the aqueous solvent and disrupt the ordered structure of the aqueous solvent, resulting in an increase in the overall ΔG. Amino acids with polar but uncharged side chains, such as serine and threonine, may hydrogen bond with a molecule of water, but otherwise undergo the same kind of hydrophobic hydration as the non-polar side chains. In the case of amino acids with side chains containing charged groups, such as glutamate or lysine, the electrostatic fields associated with these side groups are screened by water molecules, such that in an aqueous solution hydrophobic hydration is still a prominent characteristic of these amino acids as well. The nonlocal, cooperative interactions of the hydrogen bonds of the aqueous solvent surrounding these amino acids drive the in-line, surface-minimizing attraction between the coherent hydrophobic-phase patches of amino acid side chains, thereby maximizing the entropy, and minimizing the free energy, of the overall aqueous solution.
The importance of the sequential arrangements of amino acid side chain hydrophobicities in the determination of peptide and protein secondary structures has been established knowledge in protein biology for many decades. The ready availability of water for compensatory weak bonding implies that relatively small changes in ΔG occur when internal peptide backbone-related, carbonyl-imino hydrogen bonding or side chain polar groups are not satisfied. This contrasts with the much greater alteration in ΔG associated with loss of internal hydrophobic bonding, which cannot be compensated by the hydrophobically disrupted, aqueous environment. Minimization of hydrophobic free energy, ΔGhp, by water interface-reducing aggregation of nonpolar, hydrophobic amino acid side chain groups adds to the ΔG of binding that can, collectively, be orders of magnitude larger than that predicted by van der Waals theory. Mutually attractive forces mediated by hydrophobic surface minimization have been measured by atomic force spectroscopy to extend to as great a distance as 60 Å, the length scale of synaptic gaps. These attractive forces decay less than exponentially with distance. The contribution to the energy of stabilization of the three-dimensional, tertiary structure of protein by ΔGhp minimization due to aggregation of hydrophobic amino side chains has been estimated to be in the range of 70%.
Complete substitution of hydrophobically equivalent amino acids in peptides maintains and sometimes increments their peptide-receptor mediated physiological potency. Additionally, proteins which are dominated by helical secondary structures of specific turn lengths can be designed using sequences of amino acids of high and low hydrophobicities, independent of the specific amino acids chosen within each hydrophobicity class. In contrast, regions of amino acids characterized by interactions dominated by hydrogen bonds, ionic bonds, and van der Waals interactions are often exquisitely sensitive to any substitution, even those deemed to be conservative replacements. This difference between the effects on ΔG of hydrophobic interactions versus those of hydrogen bonding, ionic binding or van der Waals interactions, along with more stringent geometric requirements of the latter compared with hydrophobic weak bonds, make sequential patterns of ΔGhp in polypeptide sequences of primary importance in determining peptide-peptide or peptide-protein interactions.
Previously, the role of the hydrophobic interactions of amino acids in peptide ligands with amino acids in their associated membrane proteins have been considered in structure-function analyses in two ways. First, the local roles of amino acids have been evaluated. In these studies, ligand-receptor binding is changed by point mutations in specifically positioned amino acids, producing alterations in the hydrophobic characteristics of “binding pockets” involving neighboring but nonsequential juxtapositions of residues brought together in the protein's cooperative tertiary structure. Second, the global effects of amino acids have been examined. These effects are often studied using chimeric exchanges, with respect to the number, lengths, and locations of transmembrane segments of receptors, transporters, and/or channels, and exploit the sequential juxtapositions of amino acid hydrophobicities, using n-point window moving averages to generate what are commonly known as “hydropathy plots”. The largest, longest positive variations in these smoothed hydrophobic amplitude graphs across sequence-indexed location of membrane proteins are interpreted as the lipophilic, hydrophobic transmembrane segments of the membrane protein. The best-studied example of this approach is the finding of seven sequential hydrophobic maxima of approximately 25 residues each in the hydropathy plots of bacteriorhodopsin, assumed to be the evolutionary prototype of the G-protein gene superfamily of transmembrane receptors. This common transmembrane receptor protein motif comprises copolymers of seven transmembrane domains that snake back and forth across the lipid bilayers of membranes, anchored by lipophilic transmembrane (“TM”) segments. In this motif, three separate extracellular loops (“ELs”) are defined by the TMs: the first extracellular loop, EL-I, between TM2 and TM3; the second extracellular loop, EL-II, between TM4 and TM5; and the third extracellular loop, EL-III, between TM6 and TM7.
Secondary structures with matching wavenumbers, such as the β-strands of interleukin-1β, have been shown to bind together and initiate protein folding in a process called the “hydrophobic zipper”. We define “wavenumbers” as the inverse spatial variational frequencies of a physicochemically transformed series. They are reported here in sequential distance units of amino acids. Two long, helical secondary structures with congruent hydrophobic wavenumbers bind to create the central “hydrophobic knot” that stabilizes the structure of phospholipase A2. Recent studies of the binding of extracellular domains of growth hormone receptor by polyclonal antibodies to ovine growth hormone have shown that functional binding occurs between the epitope sequences and the extracellular segments of the growth hormone transmembrane receptor. This binding, analogous to that between peptide ligands and their receptors, is more related to common helical, loop and/or disordered secondary structures than to specific amino acid sequences or their local three-dimensional geometry.
Estimates of the relative contributions by the ΔGhp of each of the twenty amino acids to these weak bond-mediated reactions can be approximated as the free energy of transfer from aqueous to organic phases of each of the amino acids in a binary solution. Values for the free energy of transfer are measured as the relative equilibrium partitions
            K      eq        =          ⅇ                                    -            Δ                    ⁢                                          ⁢          Ghp                RT              ,expressed in kcal/mol, in these aqueous-organic binary solvents. The transformation of individual amino acids into their ΔGhp values enables the conversion of polypeptide and protein sequences into real number series available for analyses with respect to matches in sequential patterns. These have been predictive of differentially selective hydrophobic attraction and aggregation between peptide ligands and relevant extracellular receptor loops following their search via “snake upon snake” sliding diffusion, or “reptation”.
A topologically one-dimensional polypeptide sequence manifests secondary structures, which are organized into supersecondary structures and further into tertiary structures. For example, spiral rotations of ≈3.6 amino acids are the elementary component of a helical barrel comprised of 12–16 amino acids. These helical barrels may be joined by short loops into four-barrel bundles comprised of 60–70 amino acids, which may in turn be part of a protein domain containing several hundred amino acids and forming sequentially segregated or alternating barrels, bundles, β-sheets and coils and loops of varying lengths. Therefore, hydrophobic sequences of a range of lengths may underlie the conformational components of different sizes and complexity that comprise the compact intermediate states of proteins.
Transformations of polypeptide sequences into ΔGhp values have been found useful in predicting polypeptide chain turns composing secondary structures, such as α-helices and β-strands. These predictions have been confirmed by x-ray crystallographic studies. Generic α-helices are ≈5.4 angstroms long with 3.6 amino acids per rotation resulting in ≈1.5 angstrom linear distance per residue. Generic β-strands have 2.1 amino acids per turn with ≈3.3 angstroms linear distance per residue.
Sliding window ΔGhp averages were shown to be able to locate the lipophilic, hydrophobic transmembrane segments of membrane proteins, and these results were confirmed using low- and high-resolution crystallographic studies of bacteriorhodopsin as a model seven-transmembrane receptor protein. It is generally accepted that representation of polypeptide sequences as a series of amino acid aqueous volumes, partial specific volumes or ΔGhp, followed by n-block averaging, statistical predilection, hydrophobic moments, Fourier transformation, helical wheel plots or wavelet transformations can predict the size and locations of secondary and transmembrane structures in soluble and membrane proteins 60–80% of the time. These approaches have also been found useful in predicting supersecondary structures, such as the four-helix barrels and the supercoiling of α-helical structures about each other in fibrous proteins, such as the keratins and myosin tails. However, one drawback of these methods is that coexisting sequential variations in hydrophobic free energy wavelengths (mode or modes) other than that of transmembrane segments are lost in the generation of hydropathy plots by smoothing. Moreover, conventional Fourier transformation of the protein's hydrophobicities results in poor mode definition, because of end effects and intrinsic multimodality. In addition, these conventional techniques have thus far provided no solution of what is called the “inverse problem”—that is, even if the conventional methods were able to define one or more given signatory and relevant modes, how does one construct a de novo peptide using these modes? The present invention overcomes the deficiencies of the prior art, and describes successful solutions to the inverse problem.
When the amino acid sequences of neuropeptides and peptide hormones were transformed into their individual ΔGhp values, functionally related peptides demonstrated similarities in hydrophobic free energy power spectral mode or modes. Functionally related peptide family members share the same statistically significant dominant power spectral wavelengths (wavenumbers expressed as inverse spatial frequencies), though differing in their ordered amino acid content by as much as 60%. The power spectral wavelengths are expressed in units of amino acid residues as h(ω). For example, glucagon, vasoactive intestinal peptide, secretin, oxytomodulin, helodermin and growth hormone releasing factor, which share several (but not all) physiological actions and which have differing relative potencies, share a h(ω)=4.0. The range of peptide hydrophobic modes found by the power spectral transformation of amino acid sequences as hydrophobic free energies includes the well known h(ω)=3.6 and h(ω)=2.0 of the α-helix and the β-strand, respectively, but many others as well, ranging from the h(ω)=13.10 amino acid residue of acid fibroblast growth factor to the h(ω)=2.18 which dominates the hydrophobic free energy power spectrum of corticotropin releasing factor.
The HIV coat protein manifests a waxing and waning of h(ω)=7 to 9 (observed by sliding a 50-residue windowed Fourier transform along its sequence), which appears to be conserved across many of its mutations. Fibroblast growth factor (“FGF”) was predicted and confirmed to have a regulatory influence on the enzyme ribonuclease A, with which it was found to share dominant hydrophobic mode. This mode match led to experiments that demonstrated an increased half-life of messenger RNA in the presence of FGF in a neuroendocrine cell line.
The specific amino acid sequences of the calcitonins, the peptide hormone family that regulates the rate of enzymatic bone catabolism, vary by approximately 60% across species, but all are dominated by an h(ω)=3.6. The most potent calcitonin (from salmon) expresses this mode with a significantly lower hydrophobicity per residue (due the presence of a higher number of charged groups) than those of nine other species examined. The same h(ω) can be expressed across differing average hydrophobicities of the amino acid sequences of peptides and receptors.
Using a variety of techniques involving linear decomposition and transformation of the ΔGhp sequences, we have obtained diagnostic graphical patterns of known and novel proteins with weak or unknown homology, polyproteins which have multiple functional segments following post-translational processing, and discriminable subtypes in membrane pore, channel and transporter proteins. These methods, which decompose ΔGhp series into their hierarchical levels of organization to yield secondary and supersecondary patterns at multiple wavelengths and/or length scales, include a variety of wavelet transformations, eigenvalue decomposition of autocovariance matrices and all poles, maximum entropy power spectra. Using ΔGhp sequences as input, these methods elucidated primary and secondary wavenumbers and the sequential order of these multiple hydrophobic modes which, when taken together, can contribute to the preliminary classification of unknown proteins into families or provide clues to their function.
Using these techniques, we have located peptide-receptor mode matches in the ELs of seven-transmembrane proteins, in the vicinity of neurotransmitter and pharmacological binding domains suggested by studies of point mutations and chimeric exchanges. The ligands designed for mode-matched hydrophobic aggregation at these sites are postulated to have modulatory (e.g. allosteric and/or direct) influences on the physiological activities induced by the corresponding membrane protein's native ligands. In addition, mode matches were found between the α-estrogen receptor and a known peptide antagonist; between a nuclear membrane docking site on a nuclear factor of activated T-cells and the known ligand calcineurin; and between the protein chaperonin GroEL and β-lactamase, which is known to be bound by GroEL.
Eigenfunctions of autocovariance matrices of lagged ΔGhp sequence data matrices, maximum entropy power spectra and wavelet transformations were used as linear decompositions to remove the longer ΔGhp sequence wavelengths of various receptor TMs, leaving the shorter wavelength hydrophobic modes for analyses. Matches as statistical patterns in ΔGhp modes were found between peptide ligands and their membrane receptors, including kappa, mu, delta and orphan opiate receptors, corticotropin releasing factor receptor, cholecystokinin receptor, neuropeptide Y receptor, somatostatin receptor, bombesin receptor, and neurotensin receptor. Functionally significant mode matches also occur between peptides and non-peptide receptors and other proteins. For example, ΔGhp mode matches, such as those found between the dopamine co-localized neuropeptide neurotensin and the D2 dopamine membrane receptor, D2DA, and those found between the gastrointestinal and brain peptide cholecystokinin and the dopamine membrane transporter, DAT, predicted the differential binding of the pharmacologically active ligands to their respective responsive dopamine membrane receptors and, correspondingly, their lack of binding to the opposing, pharmacologically unresponsive dopamine membrane receptors.
We have proposed that functional interactions of peptides and biogenic amines may occur via selective hydrophobic aggregation of these peptides with mode-matched ELs on a target membrane protein. These interactions may result in heterosteric modification of the global kinetic conformations of the target membrane protein, and thereby produce responses to native or pharmacological ligands, distant from intramembranous ion- or charge-mediated active sites. We have modeled the joint actions on a single membrane protein as the shifting of the critical hydrophilic-hydrophobic partition between extra- and intramembranous portions of the TMs of receptors by peptide-receptor loop hydrophobic weak bond binding. This would facilitate (or retard) the first-order phase transition of native ligand induced-receptor membrane internalization, where low dielectric constant, unscreened ionic and/or charge-mediated tight binding most likely occurs. This theory contrasts with another suggesting that receptor-mediated interactions between co-localized biogenic amines and neuropeptides, such as dopamine and cholecystokinin, result from convergent intramembranous signaling through two receptors, one for each ligand, via the cooperative interactions between their membrane receptor proteins which result in G-protein mediated second messenger cascades.
Peptides are known to mediate a variety of physiological responses in many organisms, including man. Among these bioactive peptides are the peptide hormones, such as glucagon and insulin, which regulate glucose levels in the blood; gastrin and secretin, which control digestive processes; and follicle-stimulating hormone (FSH) and leuteinizing hormone, which regulate reproductive processes. Other bioactive peptides act as growth factors, including somatotropin (growth hormone), erythropoietin, and NGF (nerve growth factor).
Because of the powerful and specific effects of these peptides, they have long held great interest as drug candidates. For example, insulin is widely used to combat diabetes, and erythropoietin stimulates red blood cell formation. However, peptides have numerous drawbacks as potential therapeutics. Peptides are very unstable and sensitive to changes in their environments, which can create alterations in their structures and reduce or eliminate their physiological effects. Furthermore, peptides are susceptible to proteolysis, which complicates the problem of delivery to the desired site in the body and limits the available routes of administration. The available routes of administration are further limited by the relatively large sizes of many peptides, which make transdermal or inhalation administration methods impractical. Because peptides typically interact with other peptides or proteins to produce their biological effects, and the in vivo interactions between even a simple peptide and another protein are extraordinarily difficult to understand, enormous effort is required to determine the interactions between such molecules, or even to predict if such interactions will occur. Finally, relatively few bioactive peptides are known, in comparison to the number of potential polypeptide targets that mediate biological effects. As a result, there is great interest in finding methods to predict sequences of peptides that will interact with a polypeptide/protein target, and produce a desired physiological response. The present inventors have made the revolutionary discovery that peptides, in interaction with solvent-accessible proteins, also influence the behavior of proteins (as above) that are not specific peptide receptors.
The difficulties associated with predicting the structure of peptides that would produce a given effect in the body have led to the adoption of various combinatorial approaches. These methods produce large numbers of peptides having randomly generated sequences. The peptides are then subjected to various High-throughput screening methods to detect those peptides that may warrant further study. However, without prior knowledge of a relevant sequence pattern, often called a peptide pharmacophore, and without proven methods of pattern-conserving design, finding physiologically active lead compounds in applications involving peptide-protein interactions using purely random combinatorial searches is generally a low probability event. Depending on the candidate peptide length, the statistical expectations with respect to Hits in at least micromolar concentrations using High throughput screening of ≧300,000–400,000 component peptide libraries generated by parallel synthesis and combinatorial strategies, can be less than 2–4 per 100,000 peptides. Detection of these candidate peptides requires costly and time-consuming High-throughput methods for both peptide synthesis and for screening of the peptides. As a result, there is a great need for a method that can produce peptides or peptide-like drugs having a High probability of binding, modulating the activity of, activating or inhibiting a target polypeptide and/or protein.