Determination of high-resolution structures is required for fundamental understanding of how particular protein modifications promote or cause disease processes, e.g., cancer. A critical shortcoming of present high-throughput (HT) crystallographic structure determination efforts is that they fail to produce crystals for more than 80% of the target proteins. Floppy, unstructured regions of proteins can play a dominant role in this problem; the energetics and kinetics of crystallization are often less favorable than for fully structured proteins, and these regions are often more susceptible to degradation during purification than are structured regions, thus promoting sample heterogeneity. There is considerable advantage in producing proteins in forms that contain the structured regions in their native conformation, but with the unstructured regions otherwise stabilized. Unfortunately, no robust technique exists to discern structured vs. unstructured regions of target proteins at the pace required for HT efforts, nor is there a general and robust method to effect the stabilization of unstructured regions within proteins that are to be subjected to crystallographic structure determination efforts.
Measurement of the exchange rates of peptide amide hydrogens within a protein can report its stability at the individual amino acid scale. Ranking and comparison of the exchange rates of a protein's amide hydrogens therefore allows direct identification and localization of structured vs. unstructured regions of the protein. Despite the utility of such exchange data, the methods used to obtain it have remained labor intensive and time consuming, with substantial limitations in throughput, comprehensiveness and resolution. A number of enhancements to amide hydrogen/deuterium exchange-mass spectrometry (DXMS technology) that have substantially overcome these limitations have recently been developed and integrated. The instant invention employs this technology to guide design of superior protein constructs for crystallographic analysis at a HT pace.
As described herein, DXMS is used to precisely define the disordered regions in 21 Thermotoga maritima proteins in a total of two weeks time. The data demonstrates that this approach to defining unstructured regions of proteins works well, and at a high throughput pace and resolution, furthermore it demonstrates that hydrogen exchange analysis can identify a protein binding partner that can induce structure in previously unstructured regions of a target protein, and that co-crystallization of protein and binding partner can be accomplished.
High-resolution structures are required for a fundamental understanding of how modifications of cancer-implicated proteins can promote oncogenesis and metastasis and can provide a dramatically effective guide to the rational design of therapeutics to clinically important targets. The pressing need for this information in a timely manner contrasts with the agonizingly slow pace of present high-resolution structure determination methods. Access to these important structures will be facilitated by novel high-throughput (HT) protein structure determination approaches and improvements to conventional crystallographic methods. Despite the availability of many enhancements that facilitate such efforts, high throughput production of stable protein constructs that suitably crystallize continues to be a serious bottleneck. While definition of successful constructs has long been a problem for conventional crystallography, the inadequacies of current approaches are particularly acute and costly for structural genomics efforts that presently show only a 5-20% success rate in target crystallization. Bacterial genomes are currently the focus of many of the structural genomics efforts. However the switch to higher eukaryotes, such as mouse and human, will entail even lower success rates, due in part to more complex and higher molecular weight proteins. A critical need, therefore, exists for robust techniques that can efficiently define protein domain boundaries, the location of floppy regions between domains, as well as disordered regions within single-domain proteins.
Need for HT Methods to Reliably Define Structured/Unstructured Regions of Crystallographic Targets
Many, if not the majority, of proteins, contain unstructured regions (see, e.g., Wright and Dyson, J. Mol. Biol. 293:321-331, 1999). It is thought that such regions are induced to form specific, functionally important structures when the proteins bind to and otherwise interact with physiologic binding partners, including other proteins, carbohydrates, lipids, co-actors, etc. that are present in the protein's normal cellular milieu.
Unfortunately, while such unstructured regions may serve a function within the protein when it is interacting with binding partners in the normal cellular environment, such unstructured regions can inhibit or prevent crystallization of the entire protein when it is purified away from and therefore not bound to its structure-forming binding partners. Present structure determination efforts and methods are almost always preceded by the isolation and purification of proteins to be studied, in most cases removing from the environs of the protein these elements that can induce structure in the normal milieu. This is usually a necessity for protein crystallization, as crystallization is usually markedly inhibited within mixtures of proteins, and there is usually no a priori knowledge of the nature or identity of agents that if added to the purified protein would improve its crystallization success, nor are there facile assays for same. The approach therefore is almost always to highly purify the protein prior to crystallization trials.
This unstructured protein subregion problem has been apparent for years, but its full extent is difficult to discern from the published literature. In some instances, proteins may crystallize with some floppy regions, either at their ends or within short internal stretches. In many other instances, it is not known why a particular protein does not crystallize, even with seemingly pure protein. Using the methods of the present invention, crystallographic structure determination is facilitated by the ability to rapidly and precisely define structured and unstructured regions of a target protein, and then rapidly identify a minimal set of protein binding partner(s), small molecule co-factors, or modified chemical conditions (pH, salts, salt concentration) herein collectively termed structure-inducing agents or conditions, or abbreviated “agents”, that can be selectively admixed with the protein for subsequent co-crystallization studies to produce agent-perturbed protein-co-crystals with superior properties for diffraction analysis.
This capability to define structured and unstructured region of a protein of interest can enhance crystallographic structure determination through several mechanisms. It can increase the homogeneity of protein preparations. Moreover, unstructured regions of proteins are particularly susceptible to inadvertent degradation by contaminating cellular proteases in the course of purification and storage. The energetics and kinetics of protein crystallization are facilitated by removal or structuring of unstructured sequences (see, e.g., Kwong et al., J. Biol. Chem. 274:4115-4123, 1999).
Shortcomings of Present Methods to Localize Structured/Unstructured Regions
A number of approaches to obtain this information, ranging from stability-dependent protein expression screens, to computation of stability from primary structure (Dunker, et al. Pac. Symp. Biocomput. 3:473-484 1998, Gamer, et al. Genome Inform. 9:201-214 1998, Romero, et al. Pac. Symp. Biocomput. 3:437-448 1998) have been reported and used, but each has requirements that limit utility. With NMR spectroscopy, protein quantity, concentration, time needed, and size are limiting. Limited proteolysis coupled to mass spectrometry is presently one of the preferred approaches to refining construct definition for conventional crystallographic efforts. (Cohen, et al. Protein Science 4:1088-1099 1995). As such its use is time consuming, frequently requiring that multiple proteolytic reactions be refined for optimal cleavage. Interpretation of limited proteolysis results is confounded by the possibility that proteolysis may clip internal loops, leading to destabilization and subsequent further proteolytic degradation of what was actually a structured region. Often it is known in advance that a particular protein is likely made up of several domains that are connected by flexible linkers. Examples of this are DNA binding proteins such as the lambda repressor C-terminal (Bell, et al. J. Mol. Biol. 314:1127-1136 2001) and the TRHF dimerization domain of the human telomeric protein (Fairall, et al. Mol. Cell. 8:351-361 2001). Unfortunately, the experimental definition of domain boundaries, even when they are anticipated, is often problematic, as it was for these proteins, and is usually addressed through trial and error, by making many constructs and testing the outcome as far as expression, solubility and crystallization. As provided herein, studies indicate that measurement of peptide amide hydrogen exchange rates can provide precisely the information needed to define and localize disordered regions of proteins, and to systematically identify agents that induce structure in such subregions, and that the advanced hydrogen exchange data acquisition methods herein have the throughput and robustness needed for HT crystallography.
Peptide Amide Hydrogen Exchange Measurements Precisely and Directly Report Protein Structural Stability
For more than 40 years, peptide amide hydrogen-exchange techniques have been employed to study the thermodynamics of protein conformational change and the mechanisms of protein folding (Englander, et al. Methods Enzymol. 232:26-42 1994, Bai, et al. Methods Enzymol. 259:344 1995). More recently, they have proven to be increasingly powerful methods by which protein dynamics, domain structure, regional stability and function can be studied (Englander, et al. Protein Science 6:1101-9 1997, Engen, et al. Analytical Chemistry 73:256A-265A 2001). Peptide amide hydrogens are not permanently attached to a protein, but continuously and reversibly interchange with hydrogen present in water. The chemical mechanisms of the exchange reactions are understood, and several well-defined factors can profoundly alter exchange rates. (Englander, et al. Methods Enzymol. 232:26-42 1994, Englander, et al. Anal. Biochem. 147:234-244 1985, Englander, et al. Methods Enzymol. 26:406-413 1972, Englander, et al. Methods Enzymol. 49G:24-39 1978) One of these factors is the extent to which a particular exchangeable hydrogen is exposed (accessible) to water.
In a completely unstructured polypeptide chain, all peptide amide hydrogens are freely accessible to water and exchange at their maximal possible rate, with a half-life of exchange of approximately one second at 0° C. and pH 7.0. One of the factors that determines the rate of exchange is the extent to which a particular exchangeable hydrogen is exposed (accessible) to water. The exchange reaction proceeds efficiently only when a particular peptide amide hydrogen is fully exposed to solvent. In a completely unstructured polypeptide chain, all peptide amide hydrogens are maximally accessible to water and exchange at their maximal possible rate, which is approximately (within a factor of 30) the same for all amides; a half-life of exchange in the range of one second at 0° C. and pH 7.0. Exact exchange rates expected for particular amide hydrogens in fully unstructured segments can be reliably calculated from knowledge of the temperature, pH and primary amino acid sequence involved (Molday, et al. Biochemistry 11:150 1972, Bai, et al. Proteins: Structure, Function, and Genetics 17:74-86 1993).
In structured regions of a protein, most peptide amide hydrogens exchange much slower (up to 10^9 fold slower) than this maximal exchange rate, as they are not efficiently exposed to solvent. The ratio of exchange rates for a particular amide hydrogen, structured vs. unstructured, is referred to as the exchange protection factor, and directly reflects the free energy change in the atomic environment of that particular hydrogen in the structured state. In this sense, amide hydrogens can be treated as atomic scale sensors of highly localized free energy change throughout a protein and the magnitude of free energy change reported from each of a protein's amides in a folded vs. unfolded state is precisely equal to -RT ln (protection factor) (Bai, et al. Methods Enzymol. 259:344 1995). In effect, each peptide amide's exchange rate in a folded protein directly and precisely reports the protein's thermodynamic stability at the individual amino acid scale. Ranking and comparison of the exchange rates of a protein's amides therefore allows direct and unambiguous identification and localization of structured/unstructured regions of the protein: unstructured regions are those where substantial contiguous stretches of primary sequence exhibit the fastest possible exchange rates, indicative of complete and continuous solvation of the amide hydrogens in such segments, (Englander, et al. Methods Enzymol. 232:26-42 1994, Bai, et al. Methods Enzymol. 259:344 1995). In structured regions, the occasional peptide amide will happen to be fully solvated, and exchange rapidly. However, the typical turn in stable protein structure is accomplished in three or less amino acids, and therefore stretches of four or more continuous rapidly exchanging amides is likely indicative of disorder.
Development of High Resolution, High Throughput Peptide Amide Hydrogen/Deuterium Exchange—Mass Spectrometry (DXMS)
Deuterium exchange methodologies coupled with Liquid Chromatography Mass Spectrometry (LCMS), developed over the past 10 years, presently provide the most effective approach to study proteins larger than 30 kDa in size (Engen, et al. Analytical Chemistry 73:256A-265A 2001). Proteolytic and/or collision-induced dissociation (CID) fragmentation methods allow exchange behavior to be mapped to subregions of the protein (Engen, et al. Analytical Chemistry 73:256A-265A 2001, Hoofnagle, et al. Proceedings, National Academy of Sciences 98:956-961 2001, Resing, et al. J. Am Soc Mass Spectrom 10:685-702 1999, Mandell, et al. Anal. Chem. 70:39487-3995 1998, Mandell, et al. Proc Natl Acad Sci USA 95:14705-10. 1998, Mandell, et al. J. Mol. Biol. 306:575-589 2001, Kim, et al. J Am Chem Soc 123:9860-6. 2001, Kim, et al. Biochemistry 40:14413-21. 2001, Zhang, et al. Protein Sci 10:2336-45. 2001, Kim, et al. Protein Sci 11:1320-9. 2002, Peterson, et al. Biochem J 362:173-81. 2002, Yan, et al. Protein Sci 11:2113-24. 2002). The present invention provides a number of improvements to traditional methodologies and experimental equipment which significantly improve throughput, comprehensiveness, and resolution. As described herein, invention methods are well suited to provide data to refine high throughput structure determination.
Considerable experimental work and time are required to precisely characterize the structure of a polypeptide of interest. In general, the techniques that are the easiest to use and which give the quickest answers, result in an inexact and only approximate idea of the nature of the critical structural features. Techniques in this category include the study of proteolytically generated fragments of the protein which retain binding function; recombinant DNA techniques, in which proteins are constructed with altered amino acid sequence (for example, by site directed mutagenesis); epitope scanning peptide studies (construction of a large number of small peptides representing subregions of the intact protein followed by study of the ability of the peptides to inhibit binding of the ligand to receptor); covalent crosslinking of the protein to its binding partner in the area of the binding site, followed by fragmentation of the protein and identification of cross-linked fragments; and affinity labeling of regions of the receptor which are located near the ligand binding site of the receptor, followed by characterization of such “nearest neighbor” peptides.
Other techniques that are capable of finely characterizing polypeptide three-dimensional structure are considerably more difficult in practice. The most definitive techniques for the characterization of polypeptide structure, and receptor binding sites in particular, have been NMR spectroscopy and X-ray crystallography. While these techniques can ideally provide a precise characterization of relevant structural features, they have major limitations, including inordinate amounts of time required for study, inability to study large proteins, and, for X-ray analysis, the need for protein and/or protein-binding partner crystals.
A critical shortcoming of present high-throughput crystallographic structure determination efforts is the failure to produce crystals for around 80% of the proteins of interest. It is clear that advances in automation and crystallography data analysis have not been matched by a similar pace of progress in methods for generating protein crystals for analysis (Chayen and Saridakis, Acta Crystal. D. Biol. Crystal. 58:921-927, 2002). The process of generating protein crystals suitable for structural analysis is commonly recognized as the most difficult and time-consuming step in the process of a crystallographic structure determination (see, e.g., Wiencek, Ann. Rev. Biomed. Eng. 1:505-534, 1999). Floppy, unstructured regions of proteins can play a dominant role in this problem; the energetics and kinetics of crystallization are often less favorable than for fully structured proteins, and additionally, these regions are often more susceptible to degradation during purification than are structured regions, thus promoting sample heterogeneity.
Measurement of the exchange rates of peptide amide hydrogens within a protein can report its stability at the individual amino acid scale. Essentially, hydrogen exchange can be used to determine a stability map of a protein, reflecting the degree of ordered conformation of all regions of the protein being analyzed. Ranking and comparison of the exchange rates of a protein's amide hydrogens therefore allows direct identification and localization of structured versus unstructured regions of the protein. In the instant invention, hydrogen exchange analysis of the protein is used to identify disordered regions within the protein-stretches of 4 or more amino acids exhibiting very fast amide hydrogen exchange rates, indicating they are fully exposed to solvent at all times. The protein is then admixed with one or more potential structure-forming agents or subjected to altered conditions. Potential structure forming or inducing agents may be identified and/or produced by any of a number of means, including previously performed binding assays, two hybrid screening analysis. Hydrogen exchange analysis may then be repeated on each agent/conditionprotein combination, with assessment of the content and location of stretches of very fast exchanging amides in the protein. Agents/conditions that slow the exchange of one or more fast-exchanging segments within the protein are thereby identified as inducing structure in such segments of the protein, and are candidates for co-crystallization with the protein.
Hydrogen (Proton) Exchange
When a protein in its native folded state is incubated in buffers containing an isotope of hydrogen (for example, tritium or deuterium labeled water), isotope in the buffer reversibly exchanges with normal hydrogen present in the protein at acidic positions (for example, —OH, —SH, and —NH groups) with rates of exchange which are dependent on each exchangeable hydrogen's chemical environment, temperature, and most importantly, its accessibility to the isotope of hydrogen present in the buffer (see, e.g., Englander et al., Meth. Enzymol. 49:24-39, 1978; Englander et al., Meth. Enzymol. 26:406-413, 1972). Accessibility is determined in turn by both the surface (solvent-exposed) disposition of the hydrogen, and the degree to which it is hydrogen-bonded to other regions of the folded polypeptide. Simply stated, an acidic hydrogen present on amino acid residues which are on the outside (buffer-exposed) surface of the protein and which are hydrogen-bonded to solvent water will often exchange more rapidly with heavy hydrogen in the buffer than will a similar acidic hydrogen which is buried and hydrogen-bonded within the folded polypeptide.
Hydrogen exchange reactions can be greatly accelerated by both acid and base-mediated catalysis; and the rate of exchange observed at any particular pH is the sum of both acid and base mediated mechanisms. For many acidic hydrogens, a pH of 2.2-2.7 results in an overall minimum rate of exchange (Englander et al., Anal. Biochem. 147:234-244, 1985; Englander et al., Biopolymers 7:379-393, 1969; Molday et al., Biochemistry 11:150, 1972; Kim et al., Biochemistry 21:1, 1982; Bai et al., Proteins: Struct. Funct. Genet. 17:75-86, 1993; and Connelly et al., Proteins: Struct. Funct. Genet. 17:87-92). While hydrogens in protein hydroxyl and amino groups exchange with tritium or deuterium in buffer at millisecond rates, the exchange rate of one particular acidic hydrogen, the peptide amide bond hydrogen, is considerably slower, having a half life of exchange (when freely accessible, and freely hydrogen-bonded to solvent water) of approximately 0.5 seconds at 0° C., pH 7, which is greatly slowed to a half life of exchange of 70 minutes at 0° C., pH 2.7. When a polypeptide is in a denatured, unstructured configuration (also termed a “random coil”) all of its amide hydrogens can freely exchange with solvent hydrogen. However, the precise rate of exchange varies up to 200 fold from amide to amide in such unstructured configurations, the rate of exchange at each particular amide being determined by localized primary amino acid sequence-dependent effects that can be calculated from a knowledge of the peptide's primary sequence (Bai et al., supra). When peptide amide hydrogens are buried within a folded polypeptide, or are hydrogen bonded to other parts of the polypeptide, exchange half-lives with solvent hydrogens are often considerably lengthened, at times being measured in hours to days.
Hydrogen exchange at peptide amides is a fully reversible reaction, and rates of on-exchange (solvent deuterium replacing protein-bound normal hydrogen) are identical to rates of off-exchange (hydrogen replacing protein-bound deuterium) if the state of a particular peptide amide within a protein, including its chemical environment and accessibility to solvent hydrogens, remains identical during hydrogen exchange conditions.
Hydrogen exchange is commonly measured by performing studies with proteins and aqueous buffers that are differentially tagged with pairs of the three isotopic forms of hydrogen (1H, normal hydrogen; 2H, deuterium; 3H, tritium). If the pair of normal hydrogen and tritium are employed, it is referred to as tritium exchange; if normal hydrogen and deuterium are employed, as deuterium exchange. Different physicochemical techniques are in general used to follow the distribution of the two isotopes in deuterium versus tritium exchange. The rates of exchange of other acidic protons (—OH, —NH, and —SH) are so rapid that they cannot be followed in these techniques and all subsequent discussion refers exclusively to peptide amide proton exchange.
Tritium Exchange Techniques
Tritium exchange techniques (where the amount of the isotope is determined by radioactivity measurements) have been extensively used for the measurement of peptide amide exchange rates within an individual protein. In these studies, purified proteins are on-exchanged by incubation in buffers containing tritiated water for varying periods of time, optionally transferred to buffers free of tritium, and the rate of off-exchange of tritium determined. By analysis of the rates of tritium on-and off-exchange, estimates of the numbers of peptide amide protons in the protein whose exchange rates fall within particular exchange rate ranges can be made. These studies do not allow a determination of the identity (location within the protein's primary amino acid sequence) of the exchanging amide hydrogens measured.
Extensions of these techniques have been used to detect the presence within proteins of peptide amides which experience allosterically-induced changes in their local chemical environment and to study pathways of protein folding (Englander et al., Meth. Enzymol. 26:406-413, 1972; Englander et al., J. Biol. Chem. 248:4852-4861, 1973; Englander, Biochemistry 26:1846-1850, 1987; Louie et al., J. Mol. Biol. 201:765-772, 1988). For these studies, tritium on-exchanged proteins are often allowed to off-exchange after they have experienced either an allosteric change, or have undergone time-dependent folding upon themselves, and the number of peptide amide hydrogens which experience a change in their exchange rate subsequent to the allosteric/folding modifications determined. Changes in exchange rate indicate that alterations of the chemical environment of particular peptide amides have occurred which are relevant to proton exchange (solvent accessibility, hydrogen bonding, etc.). Peptide amide hydrogens which undergo an induced slowing in their exchange rate are referred to as “slowed amides” and if previously on-exchanged tritium is sufficiently slowed in its off-exchange from such amides there results a “functional tritium labeling” of these amides. From these measurements, inferences are made as to the structural nature of the shape changes which occurred within the isolated protein. Again, determination of the identity of the particular peptide amides experiencing changes in their environment is not possible with these techniques.
Several investigators have described technical extensions (collectively referred to as “medium resolution tritium exchange”) which allow the locations of particular slowed, tritium labeled peptide amides within the primary sequence of small proteins to be localized to a particular proteolytic fragment, though not to a particular amino acid.
Rosa and Richards were the first to describe and utilize medium resolution tritium techniques in their studies of the folding of ribonuclease S protein fragments (Rosa et al., J. Mol. Biol. 133:399-416, 1979; Rosa et al., J. Mol. Biol. 145:835-851, 1981; and Rosa et al., J. Mol. Biol. 160:517-530, 1982). However, the techniques described by Rosa and Richards were of marginal utility, primarily due to their failure to optimize certain critical experimental steps. No studies employing related techniques were published until the work of Englander and co-workers in which extensive modifications and optimizations of the Rosa and Richards technique were first described.
Englander's investigations utilizing tritium exchange focused exclusively on the study of allosteric changes which take place in tetrameric hemoglobin (a subunit and b subunit 16 kD in size each) upon deoxygenation (Englander et al., Biophys. J. 10:577, 1979; Rogero et al., Meth. Enzymol. 131:508-517, 1986; Ray et al., Biochemistry 25:3000-3007, 1986; and Louie et al., J. Mol. Biol. 201:755-764, 1988). In the Englander procedure, native hemoglobin in the oxygenated state is on-exchanged in tritiated water. The hemoglobin is then deoxygenated (inducing allosteric change), transferred to tritium-free buffers by gel permeation column chromatography, and then allowed to off-exchange for 10-50 times the on-exchange time. On-exchanged tritium present on peptide amides which experience no change in exchange rate subsequent to the induced allosteric change in hemoglobin structure off-exchanges at rates identical to its on-exchange rates, and therefore is almost totally removed from the protein after the long off-exchange period. However, peptide amides which experience slowing of their exchange rate subsequent to the induced allosteric changes preferentially retain the tritium label during the period of off-exchange.
To localize (in terms of hemoglobin's primary sequence) the slowed amides bearing the residual tritium label, Englander then proteolytically fragments the off-exchanged hemoglobin with the protease pepsin, separates, isolates and identifies the various peptide fragments by reverse phase high pressure liquid chromatography (RP-HPLC), and determines which fragments bear the residual tritium label by scintillation counting. However, as the fragmentation of hemoglobin proceeds, each fragment's secondary and tertiary structure is lost and the unfolded peptide amide hydrogens become freely accessible to H2O in the buffer. At physiologic pH (>6), any amide-bound tritium label would leave the unfolded fragments within seconds. Englander therefore performs the fragmentation and HPLC peptide isolation procedures under conditions which minimize peptide amide proton exchange, including cold temperature (4° C.) and use of phosphate buffers at pH 2.7. This technique has been used successfully by Englander to coarsely identify and localize the peptide regions of hemoglobin α and β chains which participate in deoxygenation-induced allosteric changes. The ability of the Englander technique to localize tritium labeled amides, while an important advance, remains low; at best, Englander reports that his technique localizes amide tritium label to hemoglobin peptides 14 amino acids or greater in size, without the ability to further sublocalize the label. Moreover, in Englander's work, there is no appreciation that a suitably adapted exchange technique might be used to identify the peptide amides which reside in the contacting surface of a protein receptor and its binding partner. Instead, these Englander disclosures are concerned with the mapping of allosteric changes in hemoglobin.
Unfortunately, acid proteases are very nonspecific in their sites of cleavage, leading to considerable HPLC separation difficulties. Englander tried to work around these problems, for the localization of hemoglobin peptides experiencing allosteric changes, by taking advantage of the fact that some peptide bonds are somewhat more sensitive to pepsin than others. Even then, the fragments were “difficult to separate cleanly”. They were also, of course, longer (on average), and therefore the resolution was lower. Englander concludes, “At present the total analysis of the HX (hydrogen exchange) behavior of a given protein by these methods is an immense task. In a large sense, the best strategies for undertaking such a task remain to be formulated. Also, these efforts would benefit from further technical improvements, for example in HPLC separation capability and perhaps especially in the development of additional acid proteases with properties adapted to the needs of these experiments” (Englander et al., Anal. Biochem. 147:234-244, 1985).
Over the succeeding years since this observation was made, no advances have been disclosed which address these critical limitations of the medium resolution hydrogen exchange technique. Most acid-reactive proteases are in general no more specific in their cleavage patterns than pepsin. Efforts to improve the technology by employing other acid reactive proteases other than pepsin have not significantly improved the technique.
Allewell and co-workers have disclosed studies utilizing the Englander techniques to localize induced allosteric changes in the enzyme Escherichia coli aspartate transcarbamylase (Burz et al., Biophys. J. 49:70-72, 1986; Mallikarachchi et al., Biochemistry 28:5386-5391, 1989). Burz et al. is a brief disclosure in which the isolated R2 subunit of this enzyme is on-exchanged in tritiated buffer of specific activity 100 mCi/ml, allosteric change induced by the addition of ATP, and then the conformationally altered subunit off-exchanged. The enzyme R2 subunit was then proteolytically cleaved with pepsin and analyzed for the amount of label present in certain fragments. Analysis employed techniques which rigidly adhered to the recommendations of Englander, utilizing a single RP-HPLC separation in a pH 2.8 buffer.
ATP binding to the enzyme was shown to alter the rate of exchange of hydrogens within several relatively large peptide fragments of the R2 subunit. In a subsequent more complete disclosure (Mallikarachchi, supra), the Allewell group discloses studies of the allosteric changes induced in the R2 subunit by both ATP and CTP. They disclose on-exchange of the R2 subunit in tritiated water-containing buffer of specific activity 22-45 mCi/ml, addition of ATP or CTP followed by off-exchange of the tritium in normal water-containing buffer. The analysis comprised digestion of the complex with pepsin, and separation of the peptide fragments by reverse phase HPLC in a pH 2.8 or pH 2.7 buffer, all of which rigidly adheres to the teachings of Englander. Peptides were identified by amino acid composition or by N-terminal analysis, and the radioactivity of each fragment was determined by scintillation counting. In both of these studies the localization of tritium label was limited to peptides which averaged 10-15 amino acids in size, without higher resolution being attempted.
Beasty et al., (Biochemistry 24:3547-3553, 1985) have disclosed studies employing tritium exchange techniques to study folding of the α subunit of E. coli tryptophan synthetase. The authors employed tritiated water of specific activity 20 mCi/ml, and fragmented the tritium labeled enzyme protein with trypsin at a pH 5.5, conditions under which the protein and the large fragments generated retained sufficient folded structure to protect amide hydrogens from off-exchange during proteolysis and HPLC analysis. Under these conditions, the authors were able to produce only 3 protein fragments, the smallest being 70 amino acids in size. The authors made no further attempt to sublocalize the label by further digestion and/or HPLC analysis. Indeed, under the experimental conditions they employed (they performed all steps at 12° C. instead of 4° C., and performed proteolysis at pH 5.5 instead of pH in the range of 2-3), it would have been impossible to further sublocalize the labeled amides by tritium exchange, as label would have been immediately lost (off-exchanged) by the unfolding of subsequently generated proteolytic fragments at pH 5.5 if they were less than 10-30 amino acids in size. Additional references disclosing tritium exchange methods include Fromageot et al., U.S. Pat. No. 3,828,102, which discloses using hydrogen exchange to tritium label a protein and its binding partner, and Benson, U.S. Pat. Nos. 3,560,158 and 3,623,840, which discloses using hydrogen exchange to tritiate compounds for analytical purposes.
Deuterium Exchange Techniques
Fesik et al. (Biochem. Biophys. Res. Commun. 147:892-898, 1987) disclose measuring by NMR the hydrogen (deuterium) exchange of a peptide before and after it is bound to a protein. From this data, the interactions of various hydrogens in the peptide with the binding site of the protein are analyzed.
Paterson et al. (Science 249:755-759, 1990) and Mayne et al. (Biochemistry 31:10678-10685, 1992) disclose NMR mapping of an antibody binding site on a protein (cytochrome-C) using deuterium exchange. This relatively small protein, with a solved NMR structure, is first complexed to anti-cytochrome-C monoclonal antibody, and the preformed complex then incubated in deuterated water-containing buffers and NMR spectra obtained at several time intervals. The NMR spectrum of the antigen-antibody complex is examined for the peptide amides which experience slowed hydrogen exchange with solvent deuterium as compared to their rate of exchange in uncomplexed native cytochrome-C. Benjamin et al. (Biochemistry 31:9539-0545, 1992) employ an identical NMR-deuterium technique to study the interaction of hen egg lysozyme (HEL) with HEL-specific monoclonal antibodies. While both this NMR-deuterium technique, and medium resolution tritium exchange rely on the phenomenon of proton exchange at peptide amides, they utilize radically different methodologies to measure and localize the exchanging amide hydrogens. Furthermore, study of proteins by the NMR technique is not possible unless the protein is small (generally less than 30 kD), large amounts of the protein are available for the study, and computationally intensive resonance assignment work is completed.
Subsequently, others have disclosed techniques in which exchange-deuterated proteins are incubated with binding partner, off-exchanged, the complex fragmented with pepsin, and deuterium-bearing peptides identified by single stage fast atom bombardment (Fab) or electrospray mass spectroscopy (MS) (Thevenon-Emeric et al., Anal. Chem. 64:2456-2358, 1992; Winger et al., J. Am. Chem. Soc. 114:5897-5989, 1992; Zhang et al., Prot. Sci. 2:522-531, 1993; Katta et al., J. Am. Chem. Soc. 115:6317-6321, 1993; and Chi et al., Org. Mass Spectrometry 7:58-62, 1993; Engen and Smith, Anal. Chem. 73:256A-265A, 2001; Englander et al., Protein Sci. 6: 1101-1109, 1997; Dharmasiri and Smith, Anal. Chem. 68:2340-2344, 1996; Smith et al., J. Mass Spectrometry 32:135-146, 1997; Deng and Smith, Biochemistry 37:6256-6262, 1998). In these studies, only the enzyme pepsin is employed to effect enzymatic fragmentation under slowed exchange conditions, and no attempt made to increase the number and quantity of useful fragments produced and studied beyond employing the methods disclosed by Englander and colleagues some decades prior. The resolution of the deuterium-exchange mass spectrometry work disclosed in these publications therefore remained at the 10-14 amino acid level, with the primary limitation of their art being the ability to generate only a small number of peptides with the endopeptidase pepsin, as they employed it. See FIG. 3 for an overview of this method of exchanged deuterium localization.
U.S. Pat. Nos. 5,658,739; 6,291,189; and 6,331,400 issued to Woods, Jr. (each of which is hereby incorporated by reference herein in its entirety), disclose improved methods of determining polypeptide structure and binding sites utilizing hydrogen-exchange-labeled peptide amides, importantly including a method of increasing the resolution of the technique to the 1-5 amino acid level. This increased ability to more precisely localize exchanged amide hydrogens was afforded by the novel use of acid-resistant carboxypeptidases to effect a subsequent progressive sub-fragmentation of the small number of relatively large-sized pepsin-generated peptides initially produced in the method (see FIG. 4 for an overview of the progressive proteolysis method). In these prior methods, finer localization of the labels is achieved by analysis of subfragments generated by controlled, stepwise, sub-degradation (“progressive degradation”) of each pepsin-generated, labeled peptide under slowed exchange conditions. According to these prior methods, the protein or a peptide fragment is said to be “progressively”, “stepwise” or “sequentially” degraded if a series of fragments are obtained which are similar to those which would be achieved with an ideal exopeptidase. Carboxypeptidase-P, carboxypeptidase Y, and several other acid-reactive (i.e., enzymatically active under acid conditions) carboxypeptidases are specified for use in said progressive degradation of peptides under acidic conditions. To date, no aminopeptidases have been reported that are acid resistant; as a practicality, the only exopeptidases known or likely to be useful for this method are therefore carboxypeptidases.
By performing such measurement of the exchange rates of peptide amide hydrogens within a protein, one can determine its stability at the individual amino acid level. Ranking and comparison of the exchange rates of a protein's amide hydrogens therefore allows direct identification and localization of structured versus unstructured regions of the protein. Despite the utility of such exchange data, the methods used to obtain it have remained labor intensive and time consuming, with substantial limitations in throughput, comprehensiveness and resolution.
High-resolution structures are required for a fundamental understanding of protein structure and function. It is widely anticipated that access to these important structures will be facilitated by novel high-throughput protein structure determination approaches and improvements to conventional crystallographic methods. Proteomic-scale crystallography is one avenue being vigorously pursued by several groups, involving large-scale global efforts (see, e.g., Stevens and Wilson, Science 293:519-520, 2001; and Stevens et al., Science 294:89-92, 2001).
Despite the availability of many enhancements that facilitate such efforts, high-throughput production of stable protein constructs that suitably crystallize continues to be a serious bottleneck. While definition of successful constructs for protein production has long been a problem for conventional crystallography, the inadequacies of current approaches are particularly acute and costly for structural genomics efforts that presently show only a 10-20% success rate in target crystallization. Bacterial genomes are currently the focus of many of the structural genomics efforts. However, a switch to higher eukaryotes, such as mouse and human, will entail even lower success rates, due in part to more complex and higher molecular weight proteins.
Thus, there remains a need in the art for improved simple, robust, quick and efficient methods whereby the structure of a protein of interest can be analyzed to efficiently define protein domain boundaries, the location of unstructured or floppy regions between or within domains, as well as disordered regions within single-domain proteins; and then employed to refine and optimize the processes of crystallization and crystallographic structure determination in a high-throughput manner.