The expression “behaviour” denotes an “activity” in the biological or pharmacological sense, where the molecules relate to pharmaceutical application areas, or a “property” in the physicochemical sense, where the molecules relate to non-pharmaceutical application areas, for example materials such as polymers.
Examples of “behaviours” include anti-bacterial activity, anti-fungal activity, anti-viral activity antibiotic activity, and permeability (e.g. of polymeric membranes for dialysis). Such “behaviours” are discussed elsewhere herein.
In the case of a behaviour for which any given molecule can be represented by a numerical parameter (i.e. a numerical parameter which characterises the degree of activity of that molecule), we may speak of “active molecules” defined as those which have a value of said parameter above a predetermined level, and “inactive molecules” defined as those which have a value of said parameter below that predetermined level.
In fact, e.g. within the class of active molecules, there may be more than one level of activity. For example, we may define classes of “very active”, “inactive” and “non-active” molecules. Furthermore, we may choose to classify both agonists and antagonists as being in the active class. Furthermore, the activity may be a toxicity, hence the classes will be “toxic” and “non-toxic”, etc.
It is known that research into novel active molecules, notably in the pharmaceutical sector, requires the synthesis of a very large number of molecules which it is then necessary to test in vitro or in vivo. In a best-case scenario only a very small number of these molecules will prove to be active.
In an attempt to rationalise the search for novel active molecules, the idea arose of turning to molecular modelling using computerised data bases.
One technique conventionally employed is Quantitative Structure Activity Relationships (QSAR). This is based on the hypothesis that if a molecule exhibits a given biological behaviour, all the information required to describe that molecule resides in its structure, i.e. in its atoms, bonds and shapes.
In QSAR, a number of known lead compounds which are known to be active are collected. The values of several numerical “descriptors” are derived for each molecule. The lead compounds (and their descriptors) are referred to as a learning set. As discussed below, a descriptor is a numerical parameter characterising the molecule (e.g. dipole moment). QSAR then seeks candidate molecules (i.e. new molecules) for which the descriptor values resemble the descriptor values of the learning set.
Specifically, in conventional “classical” QSAR a linear combination of descriptors is considered. In this linear combination, each descriptor is multiplied by a respective weighting factor, to derive a single numerical parameter f. The values of the weighting factors are set using the active lead compounds, so that the value of f is high for all lead compounds. A candidate molecule is then tested to see whether its value of f is high or low.
A candidate molecule for which the value of f is high (i.e. its descriptor values do resemble the descriptor values of the lead compounds) is predicted to be active, or at least likely to be active. Such a candidate molecule may then be subjected to a (usually more expensive and/or time consuming) test of whether it does indeed exhibit the activity.
Results obtained with the techniques of this type that are known to date have not been satisfactory, in particular owing to inadequate definition of parameters and the inadequacy of linear models.
Furthermore, to obtain a reasonable accuracy in predicting the activity of candidate molecules, it is generally necessary to employ at least one descriptor which can only be measured experimentally. Therefore, in order to predict the activity of a candidate molecule, the candidate molecule must be chemically synthesised.