The determination of macromolecular crystal structures, e.g., proteins, by x-ray diffraction crystallography is a powerful tool for understanding the arrangement and function of such macromolecules. Very powerful experimental methods exist for determining crystallographic features, e.g., structure factor amplitudes and phases. While the structure factor amplitudes can be determined quite well, it is frequently necessary to improve or extend the phases before a realistic atomic model of the macromolecule, such as an electron density map, can be built.
Many methods have been developed for improving the x-ray diffraction pattern phases by modifying initial experimental electron density maps using prior knowledge of characteristics expected in these maps. The fundamental basis of density modification methods is that there are many possible sets of structure factors (amplitudes and phases) that are all reasonably probable based on the limited experimental data that is obtained from a particular experiment, and those crystal structure factors that lead to maps that are most consistent with both the experimental data and the prior knowledge are the most likely overall. Atomic models are commonly used to calculate phases in macromolecular crystallography. Combined with measured amplitudes, model-based phases yield electron density maps with features of the correct crystal structure, but with a significant bias towards features embodied in the model.
Density modification techniques are a firmly established and important tool for macromolecular protein structure determination. These methods include such powerful approaches as solvent flattening, non-crystallographic symmetry averaging, histogram matching, phase extension, molecular replacement, entropy maximization, and iterative model building. The central basis of prior art approaches is that prior knowledge about expected values of the protein electron density in part or all of the unit cell can be a very strong constraint on the crystallographic structure factors. For example, prior knowledge about electron density often consists of the identification of a region where the electron density is flat, due to the presence of disordered solvent. Real-space information of this kind has generally been used to improve the quality of crystallographic phases obtained by other means, such as multiple isomorphous replacement or multiwavelength experiments, but phase information from such real-space constraints can sometimes be so powerful as to be useful in ab initio phase determination.
U.S. patent applications Ser. No. 09/512,962 and Ser. No. 09/769,612, related cases herein, teach maximum-likelihood density modification, a method for carrying out electron density modification in which the phasing information coming from various sources is explicitly kept separate from experimental structure factor amplitudes. This separation of phasing information allowed a statistical formulation for electron density modification that was very straightforward and avoided major existing difficulties with density modification. In maximum-likelihood density modification, the total likelihood of a set of structure factors {Fh} is defined in terms of three quantities: (1) any prior knowledge from other sources about these structure factors, (2) the likelihood of measuring the observed set of structure factors {FhOBS} if this set of structure factors were correct, and (3) the likelihood that the map resulting from this set of structure factors {Fh} is consistent with prior knowledge about this and other macromolecular crystal structures. This can be written as,LL({Fh})=LL0({Fh})+LLOBS({Fh})+LLMAP({Fh})  Eq. 1where LL({Fh}) is the log-likelihood of a possible set of crystallographic structure factors Fh; LL0({Fh}) is the log-likelihood of these structure factors based on any information that is known in advance, such as the distribution of intensities of structure factors; LLOBS({Fh}) is the log-likelihood of these structure factors given the experimental data alone; and LLMAP({Fh}) is the log-likelihood of the electron density map resulting from these structure factors. In this formulation, electron density modification consists of maximizing the total likelihood LL({Fh}) given by Equation 1.
The total likelihood in Equation 1 can be maximized efficiently by an iterative procedure in which a probability distribution for each phase is calculated independently of those for all other phases in each cycle of the iteration. In one cycle of optimization, an electron density map is calculated using current estimates of the structure factors. Then each structure factor is considered separately from the others, and a phase probability distribution for that structure factor is calculated from the variation of the total likelihood in Equation 1 with the phase (or phase and amplitude) of that structure factor.
In the '612 application, the map log-likelihood, LLMAP({Fh}), and the resulting log-likelihood based electron density is further modified to include information arising from structural motifs identified at particular locations in the unit cell. Then, the log-likelihood of the electron density map can be expressed as
                              LL          ⁡                      (                          ρ              ⁡                              (                                  x                  ,                                      {                                          F                      h                                        }                                                  )                                      )                          =                  ln          ⁡                      [                                                                                                                                                                                    p                            (                                                          ρ                              ⁡                                                              (                                x                                )                                                                                                                                          ⁢                          PROT                                                )                                            ⁢                                                                        p                          PROT                                                ⁡                                                  (                          x                          )                                                                                      +                                                                                                                                                                                                                                                      p                              ⁢                                                              (                                                                  ρ                                  ⁡                                                                      (                                    x                                    )                                                                                                                                                                                                            ⁢                          SOLV                                                )                                            ⁢                                                                        p                          SOLV                                                ⁡                                                  (                          x                          )                                                                                      +                                                                                                                                                                                                                        p                            ⁢                                                          (                                                              ρ                                ⁡                                                                  (                                  x                                  )                                                                                                                                                                                              ⁢                        H                                            )                                        ⁢                                                                  p                        H                                            ⁡                                              (                        x                        )                                                                                                                  ]                                              Eq        .                                  ⁢        2            where pH(x) refers to the probability that there is a structural motif at a known location, with a known orientation, somewhere near the point x, and p(ρ(x)|H) is the probability distribution for electron density at this point given that this motif actually is present.
Model bias is a very serious problem in macromolecular protein crystallography. A bias in phases that leads to electron density patterns that are incorrect, yet look like features of a protein macromolecule, is very difficult to detect. Such a bias is much more serious than an equivalent amount of noise in a map that is distributed in a random fashion in the unit cell. Bias of this kind commonly occurs when crystallographic phases are calculated based on a model that contains atoms that are incorrectly placed. Maps that are based on these phases tend to show peaks at the positions of these atoms even if the correct electron density would not.
Many methods for reducing model bias in electron density maps have been developed. One of the most widely-used approaches is the σA method of Read, Acta Cryst. A42, pp. 140–149 (1986), in which the weighting and amplitudes of structure factors (but not the phases) are optimized for minimizing effects of model bias. As the phases remain based on the model, σA weighting retains some model bias. Another important method is the use of omit maps, in which all atoms in a region of the unit cell in the model are removed before using the model to calculate phases. This method reduces model bias, but leads to electron density maps that are intrinsically much noisier than those calculated with all atoms present. Omit maps can still contain some model bias despite the omission of atoms in a region of space, as refinement can adjust the parameters describing all the other, atoms in such a way as to leave a “memory” of the coordinates of the omitted atoms. This memory in omit maps corresponds to the model bias described above that can occur in the first few cycles of map-likelihood phasing. The residual bias in omit maps can be reduced by simulated annealing if the resolution of the data and the accuracy of the starting model allows atomic refinement. Maximum-likelihood refinement of the model structure can also be used to reduce model bias even in cases where σA-weighted electron density maps are not interpretable.
Various objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.