The present invention relates to a computer based technique for the global investigation of property spaces. More particularly, but not exclusively, the present invention relates to a computer based technique for the global investigation of chemical space, with reference to drugs, using a reference set of molecules and descriptors that allows the systematic mapping of the global chemical space. The technique, may be used, for example, to generate a global map of the multidimensional chemical space that allows one to examine, in a consistent manner, the inner relationship between various molecules.
Drug discovery is a time and resource consuming exercise. Current computer based tools allow the description of chemical spaces as local models. Many chemical fields have been targeted by approaches used, or proposed, to discover new chemicals. For example, pharmaceuticals, agrochemicals, cosmetics and perfumes, photographic materials and others have benefited from the methodology developed to assist chemical synthesis. In all these fields, central to the goal of discovery is the novelty of chemical structures, and the novelty of chemical properties. With the advent of parallel synthesis and combinatorial chemistry, large numbers of chemical compounds are now within reach for synthesis and evaluation. Crucial for the practising chemist remains the goal of prioritising, out of thousands or millions of possibilities, which compound to make next.
For a pharmaceutical or agochemical compounds, there are two main types of relevant information in a molecule, i.e. chemical and biological. Medicinal chemistry handles chemical information by identifying classes of xe2x80x9cactive moleculesxe2x80x9d, then zooms in on biologically relevant information by performing various bioassays. However, in the initial stages of a research project, where little or no information is available concerning the biological target, chemical information is the only property that one can handle appropriately. Increasing the chemical information known about each compound becomes a goal of such early-phase projects, especially in the absence of active compounds.
xe2x80x9cWhat""s the best way to describe a molecule numerically and uniquely? What""s the best way to categorise clusters of molecules? Is all this work producing results that are any better than plain old random selection?xe2x80x9d These questions are quoted from an article by Elizabeth K. Wilson, xe2x80x9cComputers customise combinatorial librariesxe2x80x9d, published Apr. 27 1998 in Chemical and Engineering News (pp. 31-37). This article summarises the issues discussed at the recent American Chemical Society xe2x80x9cDiversity Symposiumxe2x80x9d, organised by Robert S. Pearlman in Dallas, Tex. According to this article, the issues of molecular diversity, and of describing chemicals in an unique and relevant manner, have not been resolved. There is no general consensus as to which approach should be taken.
Molecular similarity is an ubiquitous concept that originates from the XIXth century. Attempts to rigorously define molecular similarity can be found in the book xe2x80x9cConcepts and applications of molecular similarityxe2x80x9d, edited by Mark A. Johnson and Gerald M. Maggiora, J. Wiley and Sons, ISBN 0-471-62175-7, 1990. The impact of molecular similarity in the field of drug design, and a survey of recent advances of using molecular similarity in the pharmaceutical industry have been the aim of the book xe2x80x9cMolecular similarity in drug designxe2x80x9d, edited by Philip M. Dean, Chapman and Hall, ISBN 0-7514-0221-4, 1995.
Molecular diversity has been the target of recent molecular similarity-based methods, in the effort to maximise the structural diversity of combinatorial and/or HTS libraries, so as to ensure the largest possible coverage of the chemical space. Molecular diversity analysis methods are surveyed in volume 7/8 of Perspectives in Drug Discovery and Design, ISSN 0928-2866, xe2x80x9cComputational methods for the analysis of molecular diversityxe2x80x9d, edited by Peter Willet (1997).
Presently, tools to describe chemical space are used to generate local models. For example. Sergio Clementi and co-workers have described the principal properties space for a set of 40 heteroaromatic compounds in Quant. Struct.xe2x80x94Act. Relat. Vol. 15, pp. 108-120 (1996). In their work, Clementi et al. calculated various properties for 45 compounds running the GRID-programme, which is based on three-dimensional descriptors. Out of the resulting calculations, they have derived a set of principal properties, and have classified these compounds into ten clusters. However, they classified guanine, a biologically important heteroaromatic ring, as an xe2x80x9coutlierxe2x80x9d that falls outside the property space of the aforementioned GRID descriptors. One of the inherent limitations of local models is that the validity of the analysis is only as good as the dataset composition, and unique features are reflected into xe2x80x9coutliersxe2x80x9d which often tend to skew the statistical results and are, therefore, excluded from the analysis.
Furthermore, local models tend to be outdated, as new data are generated. This is illustrated by work performed by Svante Wold and co-workers where an initial three-dimensional property space for the 20 natural amino acids, J. Med. Chem., vol. 30, pp. 1126-1135 (1987), was extended to a five-dimensional set for 55 amino acids, Quant. Struct.xe2x80x94Act. Relat., vol. 8, pp. 204-209 (1989). Recently, this was further extended to a set of 87 amino acids, still using a five-dimensional property space, and published in J. Med. Chem., vol. 41, pp. 2481-2491 (1998). The 5 principal properties derived for amino acids are similar to the Hammett and Taft parameters, widely used in physical organic chemistry textbooks to correlate physico-chemical properties with molecular structures. These properties, termed xe2x80x9cZ-scalesxe2x80x9d, have been tentatively interpreted as measures of lipophilicity (z1), size/polarizability (z2), polarity (z3), while the fourth and fifth scales (z4 and z5) were more difficult to interpret. This work has extended the principal property space represented by the twenty natural amino acids with an additional set of 67 non-coded amino acids, some of them explicitly synthesised to cover unique properties. However, these Z-scales remain valid only for amino acids, and further synthesis of novel structures would lead to revaluation of the principal properties, and of the xe2x80x9cZ-scoresxe2x80x9d for individual amino acids.
Current computer based technology allows the end-user to generate, in silico, extremely large numbers of compounds. For example, Tripos Inc. and Silicon Graphics have announced that they in a joint project have created a virtual library consisting of 100 billion molecules, using a xe2x80x9cSpaceCrunchxe2x80x9d technology.
Tripos"" software ChemSpace(trademark) yields xe2x80x9call possible molecular products resulting from given reactions, allowing chemists to start travelling with confidence over large expanses of the chemical universexe2x80x9d. This software promises a structural description of the chemical universe/space, based on single compounds and within certain limits. Chemspace(trademark) is a searchable database consisting of billions of compounds synthesizable from known reactions and available reagents. This method includes tools to navigate in the database. However, the database represents only a subset of the chemical space, limited by the type of chemical reactions and reactants provided in xe2x80x9cSpaceCrunchxe2x80x9d. This stepwise manner to map chemical space has been, so far, the only alternative to true chemical space navigation.
From all the above, one can observe that there is a considerable need to navigate in the chemical space.
The present invention addresses the disadvantages discussed above and allows one to generate a global model that includes, and can specifically analyse, (e.g. heteroaromatic compounds (vide infra)) without the risk of extrapolation or outlying behaviour, given that the raw data are correct.
It is an object of the present invention to provide a computer based method to investigate any property space, e.g. a chemical and/or biological space, based on a set of objects (structures), e.g. chemical compounds and/or biologically relevant observations, and a set of variables, e.g. chemical descriptors and/or biologically relevant parameters, that allow a global systematic description of that given property space.
It is a further object of the present invention to provide a computer based method to investigate the chemical space in a global manner, thus avoiding redundancy and providing ways to explore novel regions of the chemical space, without the need for extrapolation.
Viewed from one aspect the present invention provides a method of mapping a target object of a target type into a target hyper-volume within a model in N-dimensional space containing a plurality of objects, each object in said model having an associated set of variables defining its position within said N-dimensional space, each variable having a maximum and minimum value within said model, said method comprising the steps of:
storing core object data representing a plurality of core objects of said target type within said target hyper-volume, said target hyper-volume being positioned spaced away from said maximum and minimum values of said variables;
storing satellite object data representing a plurality of satellite objects not of said target type positioned outside of said target hyper-volume;
determining from characteristics of said target object a position of said target object within said hyper-volume using the same evaluation criteria as used for said core objects and said satellite objects;
positioning said target object within said model relative to said core objects and said satellite objects in accordance with said determined position; and
generating a user output indicative of said relative position of said target object.
The invention recognises that when seeking to map a target object into a target hyper-volume, improved results can be achieved if the model being used includes not only core objects within the target hyper-volume but also satellite objects positioned outside of the target-hyper volume. Whilst the satellite objects may be very different to the target objects of interest, the presence of the satellite objects within the model provides the model with a much higher degree of generality and the ability to cope with target objects that are relatively different from the core objects. In contrast to the global model allowed by the invention, a local model of the type discussed above has the ability to cope with the target objects that are of a similar nature to the known objects within the model but is ill-equipped to provide meaningful results when the target object becomes relatively different from the objects already within the local model. For this reason, local models are limited by the type of input data, and are not suitable for the mapping of different types of target object. Furthermore, a relationship between different objects that may be identified with a global model may not be found when those objects are separately modelled within their local individual models.
The present invention seeks to avoid the problems of the prior art by explicitly including molecules with extreme properties in the dataset. These molecules with extreme properties play the role of satellites and allow the principal property values to remain fixed during the analysis. Thus, the present invention provides a consistent method to map the chemical space, not only for amino acids or heteroaromatic compounds, but for any type of chemical compounds considered within the set of conventions described below.
From a mathematical perspective, the use of satellite objects within the model but outside of the target hyper-volume has the advantage of providing a more flexible and globally representative set of unit vectors defining the N-dimensional space against which a particular target object may be mapped. It is surprising that mapping of a target object into a target hyper-volume, e.g. a potential pharmaceutical into the hyper-volume of known pharmaceuticals, is improved by deliberately incorporating satellite objects within the model that have a very different character to the target objects of interest within the target hyper-volume. One way of understanding this improvement is to view the satellite objects as providing the ability to interpolate the position of the target object within the model whereas a local model may require much less accurate extrapolation of the position of a target object if that target object is not very similar to the objects already within the local model.
It will be appreciated that the modelling technique of the invention could be applicable to many different fields. However, the invention is particularly well suited to models in which the objects are chemical structures and the variables are chemical variables. More particularly, the technique is highly beneficial when the target type is pharmaceutically active chemical structures and the core objects include known pharmaceuticals whilst the satellite objects are not pharmaceutically active.
In order to derive the unit vectors representing the component axis within the N-dimensional space, it has been found beneficial to use principal component analysis to determine eigen-vectors to serve as these component unit vectors. Principal component analysis provides a way of identifying the best vectors for representing an N-dimensional space without redundancy that would introduce undesirable complexity.
A target object to be mapped will also be subject to principal component analysis in the sense that it will be positioned within the model using the value of its co-ordinates in the N-dimensional space whose component unit vectors are determined using principal component analysis.
If a target object is found to lie outside of the target hyper-volume then it is sometimes useful to add that target object to the model to serve as a satellite object. Whilst the position of the target object outside of the target hyper-volume makes it more difficult to interpret its relationship with other objects within the model, its addition to the model can to have the advantage of improving the degree of global applicability of the model and may also serve to indicate a relationship with some future target object to me mapped to the model.
Many different variables and maximum and minimum values can be chosen for the model. However, in the case of a model seeking to identify pharmaceuticals, particularly useful properties are molecular weight, molecular size, molecular flexibility, molecular rigidity, formal negative charges, formal positive charges, the ability to accept hydrogen bonds, the ability to donate hydrogen bonds, lipophilicity and atomic polarisabilities as described by variables related to the aforementioned properties (vide infra). It will be appreciated that different combinations of these variables can be used in combination with other variables if so desired. Calculated molecular refractivity and molecular volume may also be used as alternatives to or in addition to molecular size.
Viewed from another aspect the present invention provides an apparatus for mapping a target object of a target type into a target hyper-volume within a model in N-dimensional space containing a plurality of objects, each object in said model having an associated set of variables defining its position within said N-dimensional space, each variable having a maximum and minimum value within said model, said apparatus comprising:
a memory for storing core object data representing a plurality of core objects of said target type within said target hyper-volume, said target hyper-volume being positioned spaced away from said maximum and minimum values of said variables and for storing satellite object data representing a plurality of satellite objects not of said target type positioned outside of said target hyper-volume;
determination logic for determining from characteristics of said target object a position of said target object within said hyper-volume using the same evaluation criteria as used for said core objects and said satellite objects;
positioning logic for positioning said target object within said model relative to said core objects and said satellite objects in accordance with said determined position; and
a user output device for generating a user output indicative of said relative position of said target object.
Viewed further a further aspect the invention provides a method of forming a model in N-dimensional space containing a plurality of objects and a target hyper-volume into which target objects are to be mapped, said method comprising the steps of:
selecting a set of variables defining said N-dimensional space;
selecting maximum and minimum values for said variables;
selecting a representative set of core objects within said target volume;
selecting a representative set of satellite object outside of said target volume; and
iteratively testing and altering said model to obtain a set of variables, maximum and minimum values, core objects and satellite objects that span said N-dimensional space and allow target objects to be mapped to within said target volume.
Viewed from a still further aspect the invention provides a carrier medium carrying a computer program product for mapping a target object of a target type into a target hyper-volume within a model in N-dimensional space containing a plurality of objects, each object in said model having an associated set of variables defining its position within said N-dimensional space, each variable having a maximum and minimum value within said model, said computer program product providing the processing steps of:
storing core object data representing a plurality of core objects of said target type within said target hyper-volume, said target hyper-volume being positioned spaced away from said maximum and minimum values of said variables;
storing satellite object data representing a plurality of satellite objects not of said target type positioned outside of said target hyper-volume;
determining from characteristics of said target object a position of said target object within said hyper-volume using the same evaluation criteria as used for said core objects and said satellite objects;
positioning said target object within said model relative to said core objects and said satellite objects in accordance with said determined position; and
generating a user output indicative of said relative position of said target object.
It will be appreciated that the carrier medium for carrying the computer program could take many different forms. Examples of carrier media include magnetic discs, optical discs, memory integrated circuits and the like, but also include distribution media such as distribution via a telecommunications system, e.g. the downloading of computer software from a telecommunications medium such as the internet.
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings.