The determination of macromolecular structures, e.g., proteins, by X-ray crystallography is a powerful tool for understanding the arrangement and function of such macromolecules. Very powerful experimental methods exist for determining crystallographic features, e.g., structure factors and phases. While the structure factor amplitudes can be determined quite well, it is frequently necessary to improve or extend the phases before a realistic atomic model of the macromolecule, such as an electron density map, can be built.
Many methods have been developed for improving the phases by modifying initial experimental electron density maps with prior knowledge of characteristics expected in these maps. The fundamental basis of density modification methods is that there are many possible sets of structure factors (amplitudes and phases) that are all reasonably probable based on the limited experimental data that is obtained from a particular experiment, and those structure factors that lead to maps that are most consistent with both the experimental data and the prior knowledge are the most likely overall. In these methods, the choice of prior information that is to be used, and the procedure for combining prior information about electron density with experimentally-derived phase information are important features.
Until recently, electron density modification has generally been carried out in a two-step procedure that is iterated until convergence. In the first step, an electron density map obtained experimentally is modified in real space in order to make it consistent with expectations. The modification can consist of, e.g., flattening solvent regions, averaging non-crystallographic symmetry-related regions, or histogram-matching. In the second step, phases are calculated from the modified map and are combined with the experimental phases to form a new phase set.
The disadvantage of this real-space modification approach is that it is not at all clear how to weight the observed phases from those obtained from the modified map. This is because the modified map contains some of the same information as the original map and some new information. This has been recognized for a long time and a number of approaches have been designed to improve the relative weighting from these two sources, including the use of maximum-entropy methods, the use of weighting optimized using cross-validation, and “solvent-flipping.”
A comprehensive theory of the phase problem in X-ray crystallography and a formalism for solving it based on maximum entropy and maximum likelihood methods has been presented by Bricogne, Acta Cryst. A40, pp. 410-445 (1984) and Bricogne, Acta Cryst. A44, pp. 517-545 (1988). This formalism describes the contents of a crystal in terms of a collection of point atoms along with probabilities for their positions. From the positions of these atoms, crystallographic structure factors can be calculated, with a certainty depending on the certainties of the positions of the atoms. Extensions of the formalism are described in Bricogne (1988). The extended formalism specifically addresses the situation encountered in crystals of macromolecules in which defined solvent and macromolecule regions exist in the crystallographic unit cell, and formulas for calculating probabilities of structure factors based on the presence of “flat” solvent regions are presented (Bricogne, 1988). The implementation of this formalism is not straightforward according to Xiang et al., Acta Cryst. D49, pp. 193-212 (1993), who point out that a full fledged implementation of this approach would be highly desirable and would provide a statistical technique for enforcing solvent flatness in advance. Xiang et al (1993) report that they settled for an approximation in which solvent flatness outside the envelope is imposed after the calculation of a model for the distribution of atoms, which corresponds to the existing procedure of flattening the solvent in an electron density map (Wang, Methods Enzymol. 115, pp. 90-112 (1985)).
The present invention solves the same problem that earlier procedures proposed by Bricogne (1988) address, and also includes the use of likelihood as a basis for choosing optimal crystallographic structure factors. The assumptions used in the present procedure differ substantially from those used by Bricogne (1988). For treatment of solvent and macromolecule (protein) regions in a crystal, Bricogne develops statistical relationships among structure factors based on a model of the contents of the crystal in which point atoms are randomly located, but in which atoms in the protein region are sharply-defined with low thermal parameters and atoms in the solvent region are diffuse, with high thermal parameters. In the present approach, no assumptions about the presence of atoms or possible values of thermal factors are used. Instead, it is assumed that values of electron density in the protein and solvent regions, respectively, are distributed in the same way in the crystal as in a model calculation of a crystal that may or may not be composed of discrete atoms.
The methods used to find likely solutions to the phase problem are also very different in the present approach compared to that of Bricogne (1988) because the assumptions used require the problem to be set up in different ways. Bricogne (1988) applies a maximum-entropy formalism developed by Bricogne (1984) to find likely arrangements of atoms in the crystal, which in turn can be used to calculate the arrangement of electron density in the crystal. In the present method, likely values of the structure factors are found by applying a likelihood-based approach based on a combination of experimental information and the likelihood of resulting electron density maps. These structure factors can be used to calculate an electron density map that is then, in turn, a likely arrangement of electron density in the crystal.
Various objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.