1. Field of the Invention
This invention pertains generally to methods for elucidating the three-dimensional structure of complex molecules, and more particularly to methods for determining, evaluating and analyzing the secondary and tertiary structure of macromolecule-ligand complexes and binding mechanisms through powder diffraction.
2. Description of the Background Art
Recent drug design efforts have been greatly influenced by the idea that drugs or peptides may target macromolecules with specific receptors to affect their biological activity. Genomic sequencing and other developments in molecular biology in the last decade have identified greater numbers of enzymes, receptors, signaling proteins, hormones, oligonucleotides and the like that may be the target of molecular therapies. Understanding the relationship between the structure and function of various molecules is fundamental to the study of biological and other chemistry-based systems. Structure-function relationships are also important in understanding the function of enzymes, cellular communication, cellular control and feedback mechanisms, and pharmaceutical agents.
Certain macromolecules in nature are known to interact with other molecules having a specific 3-dimensional spatial and electronic distribution. Any macromolecule having such specificity may be generally referred to as a receptor, whether the macromolecule is an enzyme, a protein, a glycoprotein, an antibody, or an oligonucleotide sequence of DNA, RNA, or the like. The various molecules that associate with such receptors are referred to as ligands.
Various prior art procedures have been used in an effort to identify and characterize ligands that bind to receptors. Such procedures typically involve methods of searching and evaluating the nature of novel agents such as pharmacological or therapeutic agents (i.e., drug discovery) that are useful in human or animal health care or management, agriculturally useful chemicals, selective biocides for insects, weeds, or other pests, and catalytic and other entities that may be useful in industrial processes. Thus, it is understood in many fields, including the drug discovery field, the details of how a ligand, for example, a small molecule such as a drug molecule, interacts with a macromolecule, such as a protein, are at the heart of commercial use of such ligands. For example, the vast majority of small molecule drugs act by binding to a more or less specific site in one or more protein targets. The inhibitory or promotional efficacy of the drug is related to the manner in which the molecule interacts with the target site and accurate information on the details of this interaction at the atomic and molecular level is highly desired. If such data are available, it may be possible to identify modifications to the ligand (or, in some cases, to the protein) that will serve to improve properties such as efficacy, side effects, or the cost of manufacture of the drug.
Traditionally, drug discovery and optimization have involved the expensive and time-consuming process of synthesis and evaluation of single compounds bearing incremental structural changes. Further, such compounds were often carefully chemically analyzed and characterized prior to in vitro evaluation. These methods typically included evaluation of candidate ligand compounds for binding affinity to their target macromolecules, competition for the ligand binding site, or efficacy at the target as determined via inhibition, cell proliferation, activation or antagonism end points.
The process of drug discovery in particular has changed, in part, because of the progress and evolution of a number of technologies that impact this process. Drug discovery has evolved from what was, several decades ago, essentially random screening of natural or other products, into a scientific process that not only includes the rational and combinatorial design of large numbers of synthetic molecules as potential bioactive agents, such as agonists, antagonists, and inhibitors, but also includes the identification, and mechanistic and structural characterization of their biological targets, which may be, for example, polypeptides, proteins, or nucleic acids. These key areas of drug design and structural biology are of tremendous importance to the understanding and treatment of disease. However, significant hurdles still need to be overcome when trying to identify or design high affinity ligands for a particular biological target molecule. These hurdles include the difficulty of the task of elucidating the structure of targets and targets to which other molecules may be bound or associated; the large numbers of compounds that need to be generated in order to identify and evaluate new leads or to optimize existing leads; the need to dissect structural similarities and dissimilarities between these large numbers of compounds; correlating structural features to activity and binding affinity, and the fact that small structural changes can lead to large effects on the biological activities of compounds.
One way in which the drug discovery process has been accelerated is by the generation of large collections, libraries, or arrays of compounds. The strategy of discovery has moved from the selection of drug leads from among compounds that are individually synthesized and tested to the screening of large collections of compounds. These collections may be from natural sources (Stemberg et al., Proc. Natl. Acad. Sci. USA, 1995, 92, 1609-1613) or generated by synthetic methods such as combinatorial chemistry (Ecker and Crooke, BioTechnology, 1995, 13, 351-360 and U.S. Pat. No. 5,571,902). These collections of compounds may be generated as libraries of individual, well-characterized compounds that may be synthesized, via high throughput, parallel synthesis or as a mixture or a pool of up to several hundred or even several thousand molecules synthesized by split-mix or other combinatorial methods.
Screening of such combinatorial libraries has usually involved a binding assay to determine the extent of ligand-receptor interaction (Chu et al., J. Am. Chem. Soc., 1996, 118, 7827-35). Often the ligand or the target receptor is immobilized onto a surface such as a polymer bead or plate. The identity of the ligand or ligands that bind to the receptor is known if individual characterized ligands have been applied at different spatial positions. In the case where mixtures of ligands or uncharacterized ligands are used they may be released and identified following detection of a binding event. However, solid phase screening assays can be rendered difficult by non-specific interactions. Whether screening of combinatorial libraries is performed via solid-phase, solution methods or otherwise, it can be a challenge to identify those components of the library that bind to the target in a rapid and effective manner and which, hence, are of greatest interest. This is a process that needs to be improved to achieve ease and effectiveness in combinatorial and other drug discovery processes.
Several techniques have been used in the characterization of receptor-ligand interactions including enzyme-linked immunosorbent assay ELISA (Kemeny and Challacombe, in ELISA and other Solid Phase Immunoassays: Theoretical and Practical Aspects; Wiley, New York, 1988) and radioligand binding assays (Berson and Yalow, Clin. Chim. Acta, 1968, 22, 51-60; Chard, in An Introduction to Radioimmunoassay and Related Techniques, Elsevier, Amsterdam/New York, 1982), the use of surface-plasmon resonance (Karlsson, Michaelsson and Mattson, J. Immunol. Methods, 1991, 145, 229; Jonsson et al., Biotechniques, 1991, 11, 620), and scintillation proximity assays (Udenfriend, Gerber and Nelson, Anal. Biochem., 1987, 161, 494-500). Radioligand binding assays are typically useful only when assessing competition between the binding of an unknown at a binding site and a radioligand, and also require the use of radioactive materials. The surface-plasmon resonance technique is more straightforward to use, but is also quite costly. Conventional biochemical assays of binding kinetics, and dissociation and association constants are also helpful in elucidating the nature of the target-ligand interactions. These approaches are generally helpful in the detection of receptor-ligand binding events but do not yield detailed structural information.
Several approaches to facilitating the understanding of the structure of therapeutic targets have also been developed so as to accelerate the process of drug discovery and development. These include developments in the sequencing of proteins and nucleic acids (Smith, in Protein Sequencing Protocols, Humana Press, Totowa, N.J., 1997; Findlay and Geisow, in Protein Sequencing: A Practical Approach, IRL Press, Oxford, 1989; Brown, in DNA Sequencing, IRL Oxford University Press, Oxford, 1994; Adams, Fields and Venter, in Automated DNA Sequencing and Analysis, Academic Press, San Diego, 1994). A drawback of present sequencing techniques, however, is their inability to reveal anything more than the primary structure, or sequence, of the target macromolecule.
Other techniques have been employed in an effort to elucidate secondary and tertiary structures of macromolecules, for example, Nuclear Magnetic Resonance (NMR) (Jefson, Ann. Rep. in Med. Chem., 1988, 23, 275; Erikson and Fesik, Ann. Rep. in Med. Chem., 1992, 27, 271-289), single crystal X-ray crystallography (Erikson and Fesik, Ann. Rep. in Med. Chem., 1992, 27, 271-289) and the use of computer algorithms to attempt the prediction of protein folding (Copeland, in Methods of Protein Analysis: A Practical Guide to Laboratory Protocols, Chapman and Hall, New York, 1994; Creighton, in Protein Folding, W. H. Freeman and Co., 1992).
Likewise, advances have occurred in the chemical synthesis of compounds for high-throughput biological screening. In certain drug discovery efforts, collections of molecules or “libraries”, natural or synthetic, are prepared and screened for molecules having a specified bioactivity, as indicated initially by detection of binding between one or more species or ligands in the library and a “target” molecule with which it binds to influence some biological process. More specifically, libraries consist of a complex assortment of molecules containing one or more ligands that may bind to a target of interest. The identification of ligands that bind may provide “hits” that have a desired biological activity, e.g., as a potential drug candidate. As methods have become available to screen these libraries more effectively, interest in exploiting “rational design” or the “directed molecular evolution” approach has increased. The construction and screening of small molecule libraries, including non-peptide libraries has also been reviewed. See, Special Issue on Combinatorial Libraries Accounts of Chemical Sciences, 29:111-170, 1996.
Combinatorial chemistry, computational chemistry, and the synthesis of large collections of mixtures of compounds or of individual compounds have all facilitated the rapid synthesis of large numbers of compounds for in vitro screening. Despite these advances, the process of drug discovery and optimization entails a sequence of difficult steps. This process can also be an expensive one because of the costs involved at each stage and the need to screen large numbers of individual compounds. Moreover, the structural features of target receptors can be elusive. Thus, current techniques and protocols for the study of combinatorial libraries against a variety of biologically relevant targets have many shortcomings. The tedious nature, high cost, multi-step character, and low sensitivity of many of the above-mentioned screening technologies are shortcomings of the currently available tools. Further, available techniques do not always afford the most relevant structural information. Also, the need for customized reagents and experiments for specific tasks is a challenge for the practice of current drug discovery and screening technologies.
As noted above, two of the most commonly applied methods for determining the structures of macromolecule-ligand complexes involve either gathering and interpreting high-resolution solution NMR spectra, or gathering and analyzing single crystal diffraction spectra on the complex.
Solution NMR is performed on an aqueous solution of macromolecules, while the molecules tumble and vibrate with thermal motion. NMR detects chemical shifts of atomic nuclei with nonzero spin. The shifts depend on the electronic environments of the nuclei, namely, the identities and distances of nearby atoms. 1H is the only atom occurring in sufficient abundance in natural macromolecules to be usefully observed by NMR. Structures of small macromolecules (less than 15 kD) can sometimes be resolved without special isotopic substitution of non-hydrogen atoms in the protein. Better data may be obtained from larger proteins if they are uniformly labeled by substituting the naturally abundant nuclear spin zero 12C and nuclear spin one 14N atoms with nuclear spin one-half 13C and 15N. In order to obtain NMR resonances sufficiently sharp for adequate resolution, the molecule must tumble rapidly. This typically limits the size of the molecule that can be analyzed with this method to about 30 kD. Also, the macromolecule must be soluble at high concentration (0.2-1 mM, 6-30 mg/ml) and stable for days without aggregation under the experimental conditions. However, the small molecule size limitation of NMR techniques eliminates its use with comparatively larger macromolecules. In addition, the use of NMR on combined molecules has often proven unreliable because of the inherent difficulty in distinguishing between the protons of the ligand and the protons of the small proteins. Accordingly, current NMR techniques are generally unsatisfactory for providing detailed atomic level information of the active sites of a macromolecule and the interaction of the ligand with those sites.
In contrast to solution NMR, single crystal diffraction experiments are used in an attempt to determine the structure of a macromolecule in a solid-state environment where the macromolecules have crystallized into a periodic three-dimensional array to form a macroscopic single crystal. The crystal is irradiated with a beam of radiation (often X-rays) and its diffraction properties measured as a function of sample orientation. Analysis of the resulting diffraction peaks is used to provide high-resolution structural data of macromolecules and macromolecule-ligand complexes.
Single crystal X-ray crystallography can be a powerful technique that can allow the determination of some secondary and tertiary structure of certain macromolecular targets. See, Erikson and Fesik, Ann. Rep. in Med. Chem., 1992, 27, 271-289. It can be an expensive procedure and is often difficult to accomplish because of the need to grow large crystals of the macromolecule. Crystallization of most macromolecules is challenging and time consuming, often requiring specialized conditions that are quite different from those under which the molecule functions in vivo and is often considered to be as much an art as a science that can frequently end in failure. See, T. M. Bergfors, Protein Crystallization Techniques, Strategies and Tips, a Laboratory Manual, International University Line, La Jolla, Calif. 1999; N.E. Chayen, Recent Advances in Methodology for the Crystallization of Biological Macromolecules, J. Crystal Growth 198/199, 649-655 (1999). Furthermore, substantial quantities of the macromolecule may be consumed in the search for a set of conditions that allow the macromolecule to crystallize.
Another fundamental limitation on the use of single crystal methods is the ability to produce quantities of suitably diffracting single crystals of the macromolecule-ligand complex for analysis. One approach for making a single crystal of the macromolecule-ligand complex when a single crystal of the unbound macromolecule is already available is to soak the single crystal of the unbound macromolecule in a solution of the ligand. A significant complication of this approach is that the crystal may fracture when a ligand is introduced into the crystal to form the intended macromolecule-ligand complex of interest. Another complication of this method is that it may be difficult to achieve a uniform occupation of the ligand across the unit cells of the crystal. Moreover, binding may be inhibited by steric interference from intermolecular contacts within the crystal structure of the macromolecule.
An alternative method for making single crystals of the macromolecule-ligand complex is co-crystallization where the macromolecule-ligand complex is formed in advance of growing the single crystal. Although this approach has had some success in single crystal settings, co-crystallization typically requires a search for new optimal crystallization conditions for each ligand due to shifts in the solubility of the macromolecule-ligand complex. In addition, reactions between the macromolecule and ligand over the length of time required for single crystal growth may also interfere with crystallization. See, R. A. Palmer, X-Ray Crystallographic Studies of Protein-Ligand Interactions in Chapter 1 of Protein-Ligand Interactions: Structure and Spectroscopy, edited by S. E. Harding and B. Z. Chowdhry, Oxford University Press (2001). Accordingly, single crystal X-Ray crystallography often has limited application to the investigation of macromolecule-ligand complexes. Alternative methods for determining macromolecule-ligand structures that can circumvent the disadvantages of the solution NMR and single crystal diffraction methods are therefore of critical interest and substantial commercial value. Methods are greatly needed that allow one or more of the sites of interaction between the macromolecule and ligand and the structure of the macromolecule-ligand complex to be determined.
Another method for investigating the structure of simple crystals with small unit cells is powder diffraction crystallography. Powder diffraction crystallography differs fundamentally from the single crystal diffraction method because a polycrystalline sample of material rather than a single crystal is employed. In the powder diffraction method the sample is irradiated with a suitable beam of radiation such as X-rays, electrons or neutrons. The atoms in each crystallite of the irradiated sample form a three dimensional periodic array and consequently each crystallite behaves as a tiny three-dimensional diffraction grating for the incoming radiation. If the beam of radiation used is monochromatic, the diffracted beams from the aggregate of crystallites will form a series of concentric cones of radiation whose axes are centered on the direction of the incident beam. The very large number of crystallites that comprise the polycrystalline sample ensures that these cones are of uniform intensity that is proportional to the scattered intensity from an individual crystallite. The diffracted beams of radiation are measured as a function of angle, 2θ, between the incident and diffracted beams. It can be shown, with certain simplifying assumptions, that diffraction peaks may occur when the conditionλ=2dhkl sin(θ),is satisfied, where λ is the wavelength of the radiation, dhkl is the inter-planar spacing between the Miller planes with indices h, k and l, and θ is the Bragg angle which measures the angle of reflection between the incident radiation and the Miller plane with indices h, k, l, a condition known as Bragg's law. The spacings between Miller planes in the crystal can be computed given a list of Bragg angles where diffraction peaks occur and the wavelength of radiation used in the experiment. Such spacings between Miller planes are sometimes referred to as “d-spacings” in the literature. In practice the diffraction peaks measured in a diffraction experiment are not infinitely sharp, as suggested by Bragg's law, but are broadened by factors such as the finite resolution of the diffractometer, the finite size of the crystallites in the sample, defects in the crystallites and the strains in the crystallites.
An important step in the interpretation of powder diffraction data is to identify possible space groups and lattice parameters of the sample. This may be achieved, for example, by recognizing that the diffraction profile corresponds to a material whose diffraction profile has been measured previously, or by the identification of an isostructural material of known structure. Alternatively the pattern may be indexed, by assigning h, k and l values to prominent peaks in the diffraction pattern, through the use of auto-indexing software, and likely values of the lattice parameters deduced. Several software packages for indexing of powder diffraction data are available including ITO, TREOR and DICVOL. Possible space groups of the crystal may be inferred by examining the diffraction pattern for the systematic absence of peaks in the powder diffraction pattern. See for example, A. K. Cheetham, Ab Initio Structure Solution with Powder Data, Chapter 15 of The Rietveld Method, Edited by R. A. Young, International Union of Crystallography, Oxford University Press (1993).
The periodic array of atoms in a crystal defines a periodic scattering density, ρ(r), which is probed by diffraction experiments, ρ(r) is periodic and it may be written as a Fourier sum over structure factors according to
            ρ      ⁡              (        r        )              =                  1        V            ⁢                        ∑          hkl                ⁢                              F            ⁡                          (              h              )                                ⁢                      exp            ⁡                          (                                                -                  2                                ⁢                                                                  ⁢                                  πⅈ                  ⁡                                      (                                          h                      ·                      r                                        )                                                              )                                            ,where:                1. F(h) is a complex structure factor;        2. h is a reciprocal lattice vector equivalent to hkl;        3. V is the volume of the unit cell of the crystal; and        4. r is the location of interest within the unit cell of the crystal.The complex structure factor F(h) may also be alternatively written in terms of a real amplitude, |F(h)|, and real phase, φ(h), according toF(h)=|F(h)|exp(iφ(h)).        
The intensities of peaks in both single crystal and powder diffraction experiments yield information on the amplitudes of the structure factors |F(h)|, but neither type of diffraction experiment directly yields the phase factors φ(h). Unfortunately, reconstruction of the full periodic scattering density requires knowledge of the full structure factor including both phase and amplitude parts. The fact that phase factors are not directly measured in diffraction experiments creates a significant obstacle in the interpretation of diffraction data and is the origin of the phase problem in crystallography.
Computational techniques used in the analysis of powder diffraction profiles include the Rietveld method. See for example, R. A. Young, The Rietveld Method, International Union of Crystallography, Oxford University Press (1993). In Rietveld analysis the measured diffraction profile is simulated using an approximate starting model for the diffraction instrument and the sample, and it is in essence a curve fitting procedure. An objective function, M, that measures the difference between observed and calculated diffraction profiles is usually defined through
  M  =            ∑      i        ⁢                            w          i                ⁡                  (                                    Y              oi                        -                          Y              ci                                )                    2      where:                1. Yoi is the observed intensity for the ith point in the diffraction pattern;        2. Yci is the calculated intensity for the ith point in the diffraction pattern; and        3. wi is a weight often defined by        
      w    i    =            1              Y        oi              .  
Subsequently, physical parameters of a model, such as atom positions, site occupancies and thermal motion parameters, the lattice parameters, phase compositions, crystallite sizes, profile parameters, etc. are adjusted so as to minimize the function M using a least squares optimization method. Additional parameters that may be adjusted include a scale factor, background terms to model diffuse scattering of radiation, and solvent terms to account for the effects of solvent molecules in the crystal. Rietveld programs usually offer flexibility in defining which parameters of the model should be optimized and enable the user to specify that refinements should be performed subject to constraints and restraints. For example, it may be possible to vary the site occupancies of a group of atoms subject to the constraint that the occupancies of all atoms in the group must remain equal. As another example, Rietveld programs may contain facilities for performing rigid body refinement where two or more atoms in the unit cell of the crystal are defined to form a rigid body and refinements are performed by allowing some or all of the translational and rotational degrees of freedom of the rigid body to vary. In another example, the objective function, M, may combine contributions from the powder pattern as described above and previously known stereochemical information such as bond lengths, bond angles, group planarities, volumes of chiral centers, torsion angle distributions and non-bonded contact distances. The extent to which the calculated profile is successful in reproducing the experimental pattern is measured by a number of numerical criteria of fit such as R-structure factor (RF), R-pattern (Rp) and R-weighted pattern (Rwp). These numerical criteria of fit, which are generally referred to as R values, are described in more detail by R. A. Young, supra, and are well known to those skilled in the art of Rietveld refinement. Practical information on the application of the Rietveld method can be found in the article by L. B. McCusker, R. B. Von Dreele, D. E. Cox, D. Louër, and P. Scardi, Rietveld Refinement Guidelines, J. Appl. Cryst. 32, 36-50 (1999).
Diffraction in X-ray diffraction experiments takes place as a result of scattering of the X-rays by the electrons in the material under study. In neutron diffraction experiments neutrons are scattered by the atomic nuclei of the material. Consequently, the intensity of each diffraction peak in either experiment may be used to infer information about the distribution of the respective scattering density in the unit cell and scattering density maps, constructed by Fourier methods for example, may be used to view this density. Fourier analysis methods are widely used in the interpretation of diffraction data. For example, when a partial model of a crystal structure is available, for example from the molecular replacement method, it is often possible to obtain additional insights into the structure using a difference Fourier map. The difference Fourier map is constructed by combining structure factor amplitude and phase information from the model with observed structure factor amplitudes derived from the diffraction experiment. The difference Fourier map provides approximate information on the scattering density in the unit cell that is unaccounted for or placed in error by the partial structure. The difference Fourier map may therefore enable the positions of atoms in certain materials whose positions are unknown to be determined. Other types of Fourier maps are also known in the art. A second type of Fourier map is an OMIT map which enables tests to be performed to confirm the correct positioning of an atom or a set of atoms in a structural model. A third type of Fourier map, that does not require approximate phases, and that can be useful in determining the positions of strongly scattering atoms in the unit cell, is the Patterson map. Fourier maps are extensively discussed in G. N. Ramachandran and R. Srinivasan, Fourier Methods in Crystallography, Wiley, Interscience, New York (1970).
Implementations of various Fourier map calculations for powder diffraction data are available in public domain software programs such as the GSAS package (A. C. Larson and R. B. Von Dreele, (2001) General Structure Analysis System (GSAS), Los Alamos National Laboratory Report LAUR 86-748).
Powder diffraction is frequently applied to identify such a material in a search and match procedure by comparing its diffraction profile against a database of diffraction profiles of known such materials. It is also often used to determine quantities such as the unit lattice parameters of crystals, the space groups of crystals, the relative abundance of phases within a sample and to obtain crystal structure information. A major disadvantage of the powder diffraction method, however, is that crystal structure analysis is most readily applicable to systems with small and simple unit cells. Protein structures were considered to be far too complex for any serious attempt to be made to extract detailed structural information, such as atom positions, with this approach.
Accordingly, a need exists for a speedy and efficient method of determining the three dimensional structure of macromolecules and macromolecule-ligand complexes without the limitation of growing large single crystals and that will allow ligand design or modifications that will enhance the biological or pharmacological properties of the ligand. The present invention satisfies that need, as well as others, and overcomes many of the deficiencies of previously attempted solutions.