This invention is concerned with determining the three-dimensional structure of biological macromolecules, especially proteins. In particular, it is concerned with methods for rapidly determining protein structures by NMR spectroscopy, by providing methods for simplifying NMR spectra using labeled proteins prepared from specifically isotopically labeled amino acids, and the means whereby these labeled proteins and amino acids may be obtained.
For many years, there has been intense interest in determining the three-dimensional structures of biological macromolecules, particularly proteins. So called xe2x80x9cstructure-functionxe2x80x9d studies have been carried out to determine the structural features of a molecule, or class of molecules, that are important for biological activity. Since the pioneering work of Perutz and coworkers on the structure of hemoglobin (Perutz, M. F. et al., Nature, 185:416-22 (1960)) and that of Watson and Crick on DNA in the 1950""s (Watson, J. D. and Crick, F. H. C., Nature, 171:737 (1953), both of which led to the respective scientists receiving the Nobel Prize, this field has been of major importance in the biological sciences.
More recently, the concept of xe2x80x9crational drug designxe2x80x9d has evolved. This strategy for the design of drugs involves determining the three-dimensional structure of an xe2x80x9cactive partxe2x80x9d of a particular biological molecule, such as a protein. Knowing the three-dimensional structure of the active part can enable scientists to design a synthetic analogue of the active part that will block, mimic or enhance the natural biological activity of the molecule. (Appelt, K. et al., J. Med. Chem., 34:1925 (1991)). The biological molecule may, for example, be a receptor, an enzyme, a hormone, or other biologically active molecule. Determining the three-dimensional structures of biological molecules is, therefore, of great practical and commercial significance.
The first technique developed to determine three-dimensional structures was X-ray crystallography. The structures of hemoglobin and DNA were determined using this technique. In X-ray crystallography, a crystal (or fiber) of the material to be examined is bombarded with a beam of X-rays which are refracted by the atoms of the ordered molecules in the crystal. The scattered X-rays are captured on a photographic plate which is then developed using standard techniques. The diffracted X-rays are thus visualized as a series of spots on the plate and from this pattern, the structure of the molecules in the crystal can be determined. For larger molecules, it is frequently necessary to crystallize the material with a heavy ion, such as ruthenium, in order to remove ambiguity due to phase differences.
More recently, a second technique, nuclear magnetic resonance (NMR) spectroscopy, has been developed to determine the three-dimensional structures of biological molecules, particularly proteins. NMR was originally developed in the 1950""s and has evolved into a powerful procedure to analyze the structure of small compounds such as those with a molecular weight ofxe2x89xa61000 Daltons. Briefly, the technique involves placing the material to be examined (usually in a suitable solvent) in a powerful magnetic field and irradiating it with radio frequency (rf) electromagnetic radiation. The nuclei of the various atoms will align themselves with the magnetic field until energized by the rf radiation. They then absorb this resonant energy and re-radiate it at a frequency dependent on i) the type of nucleus and ii) its atomic environment. Moreover, resonant energy can be passed from one nucleus to another, either through bonds or through three-dimensional space, thus giving information about the environment of a particular nucleus and nuclei in its vicinity.
However, it is important to recognize that not all nuclei are NMR active. Indeed, not all isotopes of the same element are active. For example, whereas xe2x80x9cordinaryxe2x80x9d hydrogen, 1H, is NMR active, heavy hydrogen (deuterium), 2H, is not active in the same way. Thus, any material that normally contains 1H hydrogen can be rendered xe2x80x9cinvisiblexe2x80x9d in the hydrogen NMR spectrum by replacing all the 1H hydrogens with 2H. It is for this reason that NMR spectroscopic analyses of water-soluble materials frequently are performed in 2H2O to eliminate the water signal.
Conversely, xe2x80x9cordinaryxe2x80x9d carbon, 12C, is NMR inactive whereas the stable isotope, 13C, present to about 1% of total carbon in nature, is active. Similarly, while xe2x80x9cordinaryxe2x80x9d nitrogen, 14N, is nmr active, it has undesirable properties for NMR and resonates at a different frequency from the stable isotope 15N, present to about 0.4% of total nitrogen in nature. For small molecules, these low level natural abundances were sufficient to generate the required experimental information, provided that the experiment was conducted with sufficient quantities of material and for a sufficient time.
As advances in hardware and software were made, the size of molecules that could be analyzed by these techniques increased to about 10 kD, the size of a small protein. Thus, the application of NMR spectroscopy to protein structural determinations began only a few years ago. It was quickly realized that this size limit could be raised by substituting the NMR inactive isotopes 14N and 12C in the protein with the NMR active stable isotopes 15N and 13C.
Over the past few years, labeling proteins with 15N and 15N/13C has raised the analytical molecular size limit to approximately 15 kD and 40 kD, respectively. More recently, partial deuteration of the protein in addition to 13C- and 15N-labeling has increased the size of proteins and protein complexes still further, to approximately 60-70 kD. See Shan et al., J. Am. Chem.Soc., 118:6570-6579 (1996) and references cited therein.
Isotopic substitution is usually accomplished by growing a bacterium or yeast, transformed by genetic engineering to produce the protein of choice, in a growth medium containing 13C-, 15N- and/or 2H-labeled substrates. In practice, bacterial growth media usually consist of 13C-labeled glucose and/or 15N-labeled ammonium salts dissolved in D2O where necessary. Kay, L. et al., Science, 249:411 (1990) and references therein and Bax, A., J. Am. Chem. Soc., 115, 4369 (1993). More recently, isotopically labeled media especially adapted for the labeling of bacterially produced macromolecules have been described. See U.S. Pat. No. 5,324,658.
The goal of these methods has been to achieve universal and/or random isotopic enrichment of all of the amino acids of the protein. By contrast, some workers have described methods whereby certain residues can be relatively enriched in 1H, 2H, 13C and 15N. For example, Kay et al., J. Mol. Biol., 263, 627-636 (1996) and Kay et al., J. Am. Chem. Soc., 119, 7599-7600 (1997) have described methods whereby isoleucine, alanine, valine and leucine residues in a protein may be labeled with 2H, 13C and 15N, but specifically labeled with 1H at the terminal methyl position. In this way, study of the proton-proton interactions between some of the hydrophobic amino acids may be facilitated. Similarly, a cell-free system has been described by Yokoyama et al., J. Biomol. NMR, 6(2), 129-134 (1995)., wherein a transcription-translation system derived from E. coli was used to express human Ha-Ras protein incorporating 15N serine and/or aspartic acid.
These methods are important, in that they provide additional means for interpreting the complex spectra obtained from proteins. However, it should be noted that the Kay et al. methods are limited to the aliphatic amino acids described above. By contrast, the method described by Yokoyama will facilitate the selective enrichment of any amino acid, but is limited to those proteins that can be expressed in a cell-free system. Glycoproteins, for example, may not be expressed in this system.
Techniques for producing isotopically labeled proteins and macromolecules, such as glycoproteins, in mammalian or insect cells have been described. See U.S. Pat. Nos. 5,393,669 and 5,627,044; Weller, C. T., Biochem., 35, 8815-23 (1996) and Lustbader, J. W., J.Biomol. NMR, 7, 295-304 (1996). Weller et al. applied these techniques to the determination of the structure of a glycoprotein including its glycosyl sidechain.
While the above techniques represent remarkable advances in this field, they each suffer from certain disadvantages. For example, all are time-consuming. In X-ray crystallographic methods, crystals can take years to form before the experiment even starts. In NMR spectroscopy, although the protein sample may be used immediately in the NMR experiment, processing the data obtained, i.e., analyzing which signal comes from which set of which atoms (the xe2x80x9cassignmentsxe2x80x9d), may also take years. Modern drug discovery research depends heavily on knowledge of the structures of biologically active macromolecules. This research would benefit substantially from enhancements in the capabilities and speed of three-dimensional structural analyses of proteins and other macromolecules.
In the past few years, growth in discovering alternative, rapid methods for the identification of candidate drugs has occurred. Genomic techniques, using rapid DNA sequencing methods and computer assisted homology identification, have enabled the rapid identification of target proteins as potential drug candidates. O""Brien, C., Nature, 385 (6616):472 (1997). Once identified, a target protein can be quickly produced using modern recombinant technology. Combinatorial chemistry, wherein large numbers of chemical compounds are simultaneously synthesized on plastic plates, frequently by robots, has revolutionized the synthesis of drug candidates, with tens of thousands of compounds (xe2x80x9clibrariesxe2x80x9d) able to be synthesized in a few months. See Gordon, F. M. et al., J. Mol. Chem., 37(10), 1385-1401 (1994). The library is then xe2x80x9cscreenedxe2x80x9d by allowing each member of the library to come into contact with the target protein. Those that bind are identified, and similar compounds are synthesized and screened. The whole process continues in an iterative manner until a drug candidate of suitably high binding affinity has been identified. One variation of this screening strategy has recently been published by Fesik et al., Science, 274, 1531-34 (1996), wherein the screening of the libraries takes place using NMR against an isotopically labeled protein and the binding is detected from perturbations in the NMR spectrum.
Prior knowledge of the three-dimensional structure of a target protein can enable the design of a xe2x80x9cfocusedxe2x80x9d combinatorial library, thereby increasing the likelihood of finding potential drug candidates that interact with the biological molecule of interest. However, whereas genomic and combinatorial chemistry each can be performed in months, known methods for protein structural determinations usually take much longer. Therefore, there is a need for methods to increase the speed with which high resolution structures of proteins, including those that are the targets of potential drug candidates, may be determined.
The present invention provides novel labeled proteins that are isotopically labeled in the backbone structure, but not in the amino acid side chains. The invention also provides novel cell culture media that contain one or more amino acids isotopically labeled in the backbone structure but not in the side chain, and methods for making a labeled protein by cultivating a protein-producing cell culture on such a culture medium.
In another aspect, the invention provides a method for determining the three-dimensional structure of a protein wherein at least one of the amino acids in the protein is specifically labeled in its backbone but not its side chain with any combination of the NMR isotopes 2H, 13C and 15N.
In yet another aspect of the present invention, a method is provided for rapidly assigning the signals in the NMR spectrum of a protein wherein at least one of the amino acids in the protein is specifically labeled in its backbone, but not its side chain with any combination of the NMR isotopes 2H, 13C and 15N.
In preferred embodiments of these various aspects of the invention, the amino acids contained in the culture media and incorporated into the protein structure are labeled in the backbone with 13C and 15N and optionally with 2H.