The invention relates to molecular modelling for drug discovery, more especially to molecular modelling using field point representations of the molecular field.
In pharmaceutical research, the aim is often to find a small molecule which interacts with a larger molecule, referred to as a target, in a specific manner. In most cases this larger molecule is a protein. Often, the process of drug discovery is an attempt to find a small organic molecule which will bind strongly to a specific region of a specific protein, and which also possesses good pharmacokinetic qualities.
The drug discovery process has traditionally been a fairly hit-and-miss affair. Initially a compound is found that binds to the target, this initial compound of interest being referred to as a lead compound, or lead for short. Leads are usually either natural products or are identified by screening large sets of compounds against the target in the hope of a chance match. Once one or more leads have been identified, a process of optimisation is carried out by medicinal chemists who make incremental changes to the lead molecule in the hope of improving its pharmaceutical properties.
In recent years theoretical chemistry and molecular modelling have become increasingly important in both lead finding and lead optimisation. Modellers attempt to generate new leads by examination of the common features of existing active compounds and by examination of the structure of the target protein if it is known. They also assist in the process of lead optimisation by predicting which changes to the lead structure are likely to be beneficial.
A molecule's affinity to a target of known or unknown structure can be estimated by reference to its similarity to other compounds, both active and inactive. To do this, the modeller is required to calculate intermolecular interactions.
It is possible to predict the binding properties of an untested molecule by representing the physical properties of a molecule which are important in its binding to other molecules, and then assessing the similarity between two such sets of physical properties, one for the untested molecule and one for a well characterised molecule.
Accurate molecular modelling is possible using advanced quantum mechanics. However, the computational effort needed for quantum mechanics is prohibitive for most biologically relevant molecules.
An alternative approach is called molecular mechanics. Molecular mechanics represents the molecule in a simple Newtonian fashion as a collection of balls and springs. The principles of molecular mechanics are simple and empirical. Moreover, molecular mechanics is computationally fast enough to cope with large proteins and other biopolymers associated with drug design.
In traditional molecular mechanics the electrostatic properties of a molecule are defined by placing a point charge at the centre of each atom (atom-centred charges or ACCs). Many different methods for calculating or estimating the value of such point charges are described in the literature. The aim of ACC methods is to distribute the point charges in such a way that the resulting electrostatic field is as similar as possible to the true electrostatic field (as determined by quantum mechanics methods). The electrostatic field as approximated by ACCs is usually quite accurate at a distance from the molecule (>5 Å), but can be quite inaccurate at the molecular surface.
To improve the quality of molecular mechanics models at the molecular surface, extended electron distributions (XEDs) have been developed. The XED method involves replacing the point charge at the centre of some atoms with a set of point charges, one at the centre of the atom and one or more others distributed around that atom a short distance away. The XED method is described in Vinter (1994) [1] and Vinter and Trollope (1995) [2]. In the XED method, the XEDs themselves are treated simply as extra atoms which have charge but no volume. XED methods can therefore calculate electrostatic interactions more accurately than ACC methods, while retaining the speed advantages of the molecular mechanics framework.
Quantum mechanical models and molecular mechanical models, such as ACC or XED models, can use the concept of field points to represent the molecular field. In this approach, the conformation of a molecule, i.e. its equilibrium arrangement either in isolation or when bound to another specific molecule or surface, is represented by a set of field points which measure field strength at a relatively small number of field maxima and minima around the molecule which are relevant to how the molecule is likely to interact with other molecules.
In order to calculate field points, a field definition must be adopted. One known field definition for molecular mechanical models uses positive and negative electrostatic interaction fields in combination with a surface interaction field. The two electrostatic interaction fields are defined by the interaction energy of a specific charged ‘probe’ molecule with the molecule of interest. For example, a probe the size of an oxygen atom, with either a +1 or a −1 elemental charge, can be used. The field value at a given point is the interaction energy of the molecule with the probe atom sited with its centre at that point. The surface interaction field is defined by the van der Waals interaction energy of a neutral ‘probe’ with the molecule, for example an uncharged oxygen atom.
Other field definitions have been used, for example ones that include electrostatic fields calculated from quantum molecular methods, and ones that include hydrophobic fields calculated from the electrostatic field and its partial derivatives. In principle, any field definition can be used provided that its value can be defined at any point in space around the molecule.
Once the field definition has been made, the field points of the molecule need to be calculated. With the molecular modelling approach, the field points are subdivided into a number of subsets, one for each field type, with each subset being calculated separately. The field points for a molecule are the values and locations of the extrema of its field, i.e. maxima and minima. The final set of field points from each field type can be filtered to remove duplicate extrema and small extrema if desired.
The field point set encodes a large amount of information about the properties of the molecule, especially regarding its interaction with other molecules. The electrostatic field points encode information about the preferred hydrogen-bonding environment of the molecule, while the surface interaction field points encode the molecule's steric bulk.
The basic assumption underlying the field point approach is that two molecules which have similar sets of field points should have similar interactions with other molecules and hence should have similar biological activities. In other words, if molecule A has a certain biological activity, and molecule B is calculated to be similar to molecule A in a relevant conformation, then it is concluded that molecule B potentially has the same biological activity.
With the field point approach, the similarity between conformations of two molecules is calculated according to a scoring formula which is sensitive to differences between the field point positions and energy values of the field points in the two field point sets. The result of the formula, i.e. the score, is a scalar quantity referred to as the field similarity value. The act of comparing fields from two molecules is sometimes referred to as field overlay or a field overlay process by virtue of the fact that the calculation of the field overlay score requires an alignment of the two molecules.
By way of example, suppose that molecules A and B are to be compared for similarity. Molecule A is known to bind to a particular protein. The conformation of A when bound to that protein is also known. Molecular B is a new candidate molecule for potentially binding to the same protein. To carry out the comparison calculation, the bound conformation of A is compared to multiple conformations of B. Multiple conformations of B are tried, since, if B is able to bind to the protein, the conformation of B which allows such binding is not yet known.
In another example, the bound conformation of molecule A may not be known, even though it is known that molecule A binds to a particular protein. In that case, the comparison process will compare multiple conformations of A successively with multiple conformations of B.
The comparison process comprises two stages. The first stage is an alignment step of determining an alignment between the conformers of A and B. The second stage is a scoring step of calculating the field similarity for the aligned position.
In practice, the two stages are often carried out iteratively. After an initial approximate alignment, fine alignment may be an automated process of maximising the score, i.e. the field similarity value, through incremental changes in the alignment. It is noted that the initial alignment may be a completely random one (in a Monte-Carlo type process). The comparison process can be carried out independently for each field type in a molecular mechanics model. A field similarity value is calculated independently for each field, referred to as a field similarity subvalue in the following, and a weighted sum is taken to be the overall field similarity value.
The scoring step, i.e. the field similarity calculation, is critically important, since the field similarity value is the ultimate measure of the potential of candidate molecule B to have the same biological activity as molecule A.
In the XED model of Vinter and Trollope (1995) [2], the method used to calculate the field similarity value for a given alignment of two conformers A and B is now described. It is recalled that Vinter and Trollope use a field definition having three field types, namely positive and negative electrostatic fields and a surface interaction field.
A pseudo-Coulombic potential is defined between the field points on molecule A and the field points on molecule B and the value of this potential function is calculated. The pseudo-Coulombic potential treats each field point as if it were a point charge in space with its charge being the energy value of the field point. A pseudo-potential energy is then calculated between these sets of point pseudo-charges. The +ve electrostatic field and −ve electrostatic field points are allowed to interact (being assigned positive and negative charges respectively), but the pseudo-Coulombic potential is calculated separately for the surface interaction field points. The higher the potential calculated with this method, the more similar the two conformers are taken to be.
In the XED model of Vinter and Trollope (1995) [2], although not directly described in the paper, each of the field similarity subvalues was determined according to the following pseudo-Coulombic potential formula:
      E    AB    =      -                  ∑                  i          ,          j                    ⁢                          ⁢                                    q            iA                    ⁢                      q            jB                                    k          +                      d                          iA              ,              jB                                                                                  ⁢              l              ⁢                                                                                      where qiA is the energy value of the ith field point on molecule A (labelled q in view of the Coulomb analogy being used), qjB is the energy value of the jth field point on molecule B, diA,jB is the distance between the ith field point on molecule A and the jth field point on molecule B, the sum is over all field points i on molecule A and j on molecule B, k is a constant with a value of 1, and l is a constant with a value of 1. The constant k was added into the usual Coulomb formula to avoid the pseudo-Coulombic energy value becoming too large for field point pairs that are very close (i.e. when distance d is very small) and thereby distorting the results.
Other prior art is described in references [3]-[6].