Medicinal chemistry is an iterative design process in which the biological properties of an analogous set of compounds are modified and assessed until a compound is discovered that meets required criteria for subsequent development. On average over one thousand compounds (at a cost of up to £2,000 each), may be synthesised and tested during the course of a drug discovery project, as it proceeds from an initial screening ‘hit’ to drug candidates for pre-clinical assessment. Therefore, the cost of developing new drug candidates can be extremely high even before clinical trials can be undertaken.
High-speed analogue chemistry library methods can be useful for producing a large number of low-cost compounds with relatively simple chemistries, to quickly explore the chemical space around an initial starting compound. However manual synthesis of individual compounds is usually unavoidable when a chemical series is to be optimised from a ‘lead’ to a clinical drug candidate, because specific, perhaps complicated, chemical design changes might be required.
Another problem in drug design is that during lead optimisation a number of independent, non-correlated (often divergent) properties may need to be optimised, such as: potency against a desired target; selectivity against non-desired targets; low probability of toxicity; and good drug metabolism and pharmacokinetic properties (ADME). To add to the complications, often the chemical changes that might most benefit one of these properties, such as target specificity, may be detrimental to another property, such as bioavailability. Therefore, the lead optimisation process can be considered as a complex multi-objective process.
Typically, the process of lead optimisation is based on a cycle of hypothesis generation and experimentation. Each compound design can be considered a ‘hypothesis’, which may be falsified by experimentation. The experimental results may be represented as structure-activity relationships, which generate a landscape of hypotheses as to which chemical structure is likely to contain the desired characteristics. The process of drug design is also an optimisation problem, as each project starts out with a product profile of desired attributes, e.g. a target function. The medicinal chemistry solution (i.e. a desired drug candidate profile), can be accurately described by its desired properties, for example: as a drug for administration by a preferred route (e.g. oral), having particular drug properties (e.g. solubility and bioavailability profiles), and a minimum degree of selectivity for the target molecule (e.g. at least 100-fold). However, even though the problem can be described, it is a difficult challenge to find an optimal solution from the vast space of hypothetical feasible solutions. In an attempt to overcome some of these commercial and technical problems, increasingly researchers are turning to artificial intelligence, e.g. software-based solutions and databases, which have been developed to automate parts of the iterative scientific discovery process.
Existing artificial intelligence-based approaches for drug design/evolution and lead optimisation can be generally categorised into three areas: (1) databases of chemical transformations for use in drug lead optimisation; (2) user-guided evolutionary drug design; and (3) automated evolutionary drug design. However, to date, none of the known systems offer the power, versatility and strategic intelligence that are required to achieve the desired levels of process simplification and cost savings.
(1) Chemical Transformations. In the process of drug design, the creative process of the chemist often utilises knowledge of ‘tactics’ or ‘transformations’, which can be applied in many situations to help design novel compounds. A creative transformation is often not just a simple chemical reaction leading to a new product, since many transformations may be required to duplicate several steps in a proposed synthetic route. Common tactics employed by the experienced medicinal chemist include, amongst others, ‘methylene shuffle’, adding lipophilicity, adding chirality, searching for hydrogen-bond interactions, and introducing or breaking conformational constraints. Previous attempts to catalogue lists of chemical transformations for use by medicinal chemists in the drug design process have been reported by Stewart K. D. et al. (2006) “Drug Guru: a computer software program for drug design using medicinal chemistry rules”, Bioorg. Med. Chem., 14(20), 7011-22; and Raymond J. W. et al., (2009) “Rationalizing lead optimization by associating quantitative relevance with molecular structure modification”, J. Chem. Inf. Model, 49, 1952-62. Stewart K. D. et al. (2006) describes a database of chemical transformations that were derived from interviews with a number of medicinal chemists. The program takes an input compound, applies one generation of chemical transformations, and displays the resulting output structures for consideration by the medicinal chemist. Thus, the system acts as an ‘ideas prompt’, simply automating the thought process that would normally be undertaken by a medicinal chemist. Raymond J. W. et al., (2009) describes an alternative method of cataloguing chemical transformations. In this case, rather than rely on knowledge inputted from the real-life experiences of medicinal chemists, the authors attempted to systematically mine their own databases of medicinal chemistry structures. They identified ‘similar’ structures where the ratio of the number of bonds in the maximal common substructure in comparison to the maximum number of bonds in the compound is at least 0.7, thereby to automatically identify chemical transformations by comparison of those similar structures/compounds. However, no biological information was used in the selection of the transformations. Notably, neither of these systems of chemical transformations goes beyond the stage of simply presenting a host of potential transformed structures to the chemist for further consideration. Neither system scores compounds against a particular design goal, nor do they provide an automated, iterative closed loop system of hypothesis generation, assessment and redesign, which might alleviate the burden on the chemist to mentally and chemically assess the many possibilities.
(2) User-Guided Evolution. An alternative approach for generating compounds based on chemical knowledge of drug designs is a ‘genetic algorithm’. Unlike the knowledge-based methods above, chemical transformations in a genetic algorithm are not defined by the creative or historical knowledge of medicinal chemistry, but are instead based on a simple set of genetic algorithm (or programmed) transformations to generate new compounds from a starting structure. These transformations can be categorised as ‘crossover’ (where substructures from two different molecules are selected and swapped) and ‘mutation’ (where a single atom or bond is changed using a small number of defined steps, e.g. add, insert, delete, replace etc.). Lameijer et al., (2006), “The Molecule Evaluator: An Interactive Evolutionary Algorithm for the Design of Drug-Like Molecules”, J. Chem. Inf. Model, 46, 545-552, describes such an approach for proposing new chemical structures. As in the above knowledge-based systems, Lameijer et al., (2006) does not iteratively and automatically optimise compounds or score them in such a way as to identify best solutions; instead, it is down to the user to visually, mentally and chemically select and assess the possible molecules.
(3) Automated Evolution. This approach attempts to develop multi-objective methods using evolutionary approaches for drug design. Examples are reported in Brown et al., (2004), “A Graph-Based Genetic Algorithm and Its Application to the Multiobjective Evolution of Median Molecules”, J. Chem. Inf. Comput. Sci., 44, 1079-1087; and Nicolaou et al., (2009), “De Novo Drug Design Using Multiobjective Evolutionary Graphs”, J. Chem. Inf. Model, 49, 295-307. Brown et al., (2004; see also Brown et al., 2004, “The de novo design of median molecules within a property range of interest”, J. Computer-Aided Mol. Design, 18, 761-771; and Brown et al., 2006, “A novel workflow for the inverse QSPR problem using multiobjective optimization”, J. Computer-aided Mol. Design, 20, 333-341) describes a genetic algorithm that generates novel compounds from compound ‘fragments’. This system uses as its input a library of molecular fragments from which it builds new molecules via a genetic algorithm. The genetic algorithm builds compounds by randomly flipping segments in a population of graph-based ‘chromosomes’. The mutations are atom/node based, such as append, prune, insert, or delete; or bond/edge based, such as add, delete, or substitute. A key feature of the method appears to be in defining of objectives as a chemical structure (or structures) that the system seeks to evolve compounds towards and, hence, maximise the structural similarity to an objective molecule (i.e. a defined ‘median molecule’). The multi-objective genetic algorithm is defined by applying Pareto ranking to the designed compounds, and compounds that sit on the Pareto frontier (in terms of similarity with objective chemical structures) are prioritised. In the reported examples, none of the evolved compounds were validated as having any biological activity or particular utility. Furthermore, the system was not demonstrated in a method for drug discovery. Nicolaou et al. (2009) describes a de novo drug design algorithm using multi-objective evolutionary graphs. In this system, chemical structures are generated from a library of molecular building blocks, based on graph-based ‘chromosomes’, using mutation and crossover operations are previously described. An objective encoded scoring method is applied, which includes binding affinity predictions (mainly 3D protein structure docking), molecular similarity and chemical structure scores; and selection of molecules is based on Pareto ranking. Although an example of the method is applied to the in silico design of estrogen receptor alpha ligands, no experimental activity is reported for the designed compounds. Again, the authors indicate that this system is an ‘ideas generator’, rather than a validated, complete design system.
Hence, to date, none of the prior art has demonstrated utility in the design and prediction of active molecules that are valid drug candidates having desired biological activity. Therefore, there remains a need in the art for a truly automated drug design system, which goes beyond the level of simply automating the normal thought processes of a skilled person in the field, thereby aiding the medicinal chemist's decision making in compound design. There is a further need in the art for an in silico (or computer-based/software) drug design system that is capable of true lead optimisation, by assessing large numbers of hypothetical molecules and converging on a more limited number of potentially active molecules, thereby reducing the number of compounds that must be synthesised and tested during the course of a project. Moreover, there is an important unmet need in the art for an in silico system with demonstrable power and utility in the actual design and prediction of drug candidates having desired biological properties and activities. By satisfying one or more of these needs, the de novo drug design process may be simplified, and the costs of lead optimisation reduced, thereby having a direct impact on productivity in the field.
Another challenge in the art of drug design and evolution is in the area of ‘polypharmacology’. While the dominant paradigm in drug discovery has traditionally been to design drugs (or ligands) with maximum selectivity against a specific target molecule; more recently in fields such as oncology, psychiatry and antimicrobials, it has been shown that effective drugs may act via modulation of multiple rather than single targets. In fact, advances in systems biology and network structures are now revealing that, in some cases, multi-target drugs may have greater clinical efficacy than exquisitely selective compounds. However, this recognition reveals further problems in the current systems and methods for rational drug design, because of the potential need for optimising multiple structure-activity relationships at the same time. None of the prior art systems provide a robust method for the design and evolution of drug candidates having desired polypharmacological profiles. Accordingly, there is a further need in the art for a computational (or in silico) system for the design and automatic optimisation of compounds having multiple activities.
The present invention addresses one or more of the above-mentioned problems in the prior art, by providing a computational system for ‘intelligent’ drug design and evolution.