The present invention relates to the fields of crystallographic methods and apparatus for determining the three-dimensional structure of macromolecules by crystallography or electron microscopy.
Under special conditions, molecules condense from solution into a highly-ordered crystalline lattice, which is defined by a unit cell, the smallest repeating volume of the crystalline array. The contents of such a cell can interact with and diffract certain electromagnetic and particle waves (e.g., X-rays, neutron beams, electron beams etc.). Due to the symmetry of the lattice, the diffracted waves interact to create a diffraction pattern. By measuring the diffraction pattern, crystallographers attempt to reconstruct the three dimensional structure of the atoms in the crystal.
A crystal lattice is defined by the symmetry of its unit cell and any structural motifs the unit cell contains. For example, there are 230 possible symmetry groups for an arbitrary crystal lattice, while the unit cell of the crystal lattice group may have an arbitrary dimension that depends on the molecules making up the lattice. Biological macromolecules, however, have asymmetric centers and are limited to 65 of the 230 symmetry groups. See Cantor et al., Biophysical Chemistry, Vol. III, W. H. Freeman & Company (1980), which is incorporated herein by reference for all purposes.
A crystal lattice interacts with electromagnetic or particle waves, such as X-rays or electron beams respectively, that have a wavelength with the same order of magnitude as the spacing between atoms in the unit cell. The diffracted waves are measured as an array of spots on a detection surface positioned adjacent to the crystal. Each spot has a three-dimensional position, hkl, and an intensity, I(hkl), both of which are used to reconstruct the three-dimensional electron density of the crystal with the so-called Electron Density Equation: ##EQU1## where .rho.(x,y,z) is the electron density at the position (xyz) in the unit cell of the crystal, V is the volume of the unit cell, and F(h,k,l) is the structure factor of the detected spot located at point (h,k,l) on the detector surface. As expressed above, the Electron Density Equation states that the three-dimensional electron density of the unit cell is the Fourier transform of the structure factors. Thus, in theory, if the structure factors are known for a sufficient number of spots in the detection space, then the three-dimensional electron density of the unit cell could be calculated using the Electron Density Equation.
A number of problems exist, in actual practice, however. The Electron Density Equation requires knowledge of the structure factors, F(h,k,l), which are generally complex numbers that consist of both an amplitude and a phase. The amplitude of a structure factor, .vertline.F(h,k,l).vertline., is simply the square root of the experimentally measured intensity, I(h,k,l). The phase of each structure factor, on the other hand, is not known and cannot be measured directly in a diffraction experiment. Nor can it be derived directly for macromolecules. Without the phase of each structure factor, determination of the three-dimensional structure of most large structures by the use of the Electron Density Equation is impossible except for special cases.
Theoretical methods are exemplified by the Direct Method and the Patterson Method or their extensions, as well as the maximum entropy method or the use of simulated annealing in both reciprocal and Patterson space. These methods calculate the phases directly from the measured intensities of the diffracted waves and allow routine computer solutions for molecules having typically less than approximately 100 non-hydrogen atoms. (As is known in the art of crystallography, hydrogen atoms contribute little to the diffraction process.) For structures having more than 100 non-hydrogen atoms, such as proteins, peptides, DNA, RNA, virus particles, etc., such direct methods become impractical and, in most cases, impossible. Fortunately, experimental methods, such as Multiple Isomorphous Replacement and Anomalous Scattering, exist to aid in the determination of these phases.
Multiple Isomorphous Replacement is based on the observation that the absolute position and, therefore, the phase of the structure factor of a heavy-atom incorporated into an otherwise unmodified crystal lattice can be determined. With this knowledge, the phase of each structure factor in the derivative is determined relative to that of the heavy-atom. Except for crystals having centrosymmetric symmetry, at least two heavy metal derivatives are required to unambiguously determine the phase of a structure factor. Furthermore, Multiple Isomorphous Replacement requires that each heavy metal derivative does not otherwise change the structure of the molecule, or distort the unit cell of the crystal.
Other experimental techniques, used in conjunction with Multiple Isomorphous Replacement allow the crystallographer to forego analysis of some heavy metal derivatives. One such technique, Anomalous Scattering, is based on the observation that particular heavy-atoms scatter radiation of different wavelengths significantly differently. With this technique, one heavy metal derivative studied at two wavelengths yields data equivalent to two heavy-atom derivatives studied at one wavelength.
Other techniques completely circumvent the preparation and study of heavy metal derivatives- For example, molecular replacement, as the name suggests, uses a molecule having a known structure as a starting point to model the structure of the unknown crystalline sample. This technique is based on the principle that two molecules that have similar structures and similar orientations and positions in the unit cell diffract similarly. Effective use of this technique requires that the structures of the known and unknown molecules be highly homologous.
Molecular replacement involves positioning the known structure in the unit cell in the same location and orientation as the unknown structure. Difficulty in using this technique arises because the result is critically dependent on the exact positioning of the known structure. Slight variations in either the location or orientation of the known structure often results in complete failure. Once positioned, the atoms of the known structure in the unit cell are used in the so-called Structure Factor Equation to calculate the structure factors that would result from a hypothetical diffraction experiment. The Structure Factor Equation takes the form: ##EQU2## where F(hkl) is the structure factor of the molecule at the point (hkl) on the detector surface, f.sub.j is the atomic structure factor (that is, it represents the scattering properties of the individual atom), N is the number of non-hydrogen atoms, and x.sub.j, y.sub.j, z.sub.j are the fractional coordinates of atom j in the unit cell. The structure factor calculated is generally a complex number containing both the amplitude and phase data for the molecular replacement model at each point (hkl) on the detector surface. These calculated phases are used, in turn, with the experimental amplitudes measured for the unknown structure to calculate an approximate electron distribution. By refinement techniques, this approximate structure can be fine-tuned to yield a more accurate and often higher resolution structure.
The molecular replacement technique requires knowledge of the number of molecules, and the orientation and position of each molecule within the unit cell. Initially the electron density calculated from the phases from the molecular replacement model and experimental amplitudes closely resembles the electron density of the model. Only after refinement of the initial structure will the success or failure of the method be apparent. For instance, failure occurs if the initial structure fails to converge (as represented by a correlation value) or if the refined structure diverges from the structure of the model during the refinement process. In cases where the unknown structure is a substrate or intermediate bound to a protein, molecular replacement's success is evident when the result is a structure whose only difference is added electron density that represents the protein-bound molecule. The determination of such structures is important in the area of pharmaceutical drug testing where the structure of protein-bound drugs and intermediates yield important information about binding and mechanism. Similarly, new mutants of a protein or variations of protein-bound inhibitors are well suited for molecular replacement, as are structures of the same molecule that have crystallized in different symmetry groups.
Molecular Replacement is not always effective, however. Determination of the number of copies of the model in the asymmetric unit and the correct location and orientation of each copy is critical and time consuming, since ideally one samples all rotational and translational degrees of freedom in the asymmetric unit to determine the correct set of parameters.
Recently, S. Subbiah has reported an ab initio approach for obtaining the low-resolution envelopes of certain macromolecules, (S. Subbiah, Science (1991) Vol. 252, pp. 128-133 which is incorporated herein by reference for all purposes). In this method, a collection of hard-sphere point scatterers is permitted to move randomly and ultimately find an arrangement corresponding to the image of the macromolecule or the surrounding solvent. The details of this technique are discussed in U.S. application Ser. No. 831,258 (attorney docket number 5490A-80-1, filed Jan. 30, 1992) which is incorporated herein by reference for all purposes. Unfortunately, this technique has not been extended to high-resolution results, and is typically limited to results having a resolution in the range of 10 to 15 .ANG..
Because Multiple Isomorphous Replacement, Molecular Replacement, and their related techniques do not work for all cases, there exists a need for simplified, efficient methods to determine the high-resolution structure of crystalline molecules. The present invention fulfills these and other needs.