Drug discovery project teams are faced with many difficult and problematic decisions which range, e.g., from choosing the best target for a potential therapeutic indication to selection of appropriate compounds in hit finding, hit-to-lead, lead optimization and nomination of a preclinical candidate. Poor decisions can result in failed drug discovery projects. A poor choice of a target or a compound can result in financial loss and wasted efforts due to unnecessary synthesis and screening, or late stage failure of research projects. Conversely, over-aggressive filtering of the drug pipelines can lead to missed opportunities to find new therapies. Experimental techniques, predictive modeling and informatics, have not solved the enormous challenges facing drug discovery research. The average cost per new molecular entity (NME) launched on the market has risen from an estimated $805 million in 2003 (Dimasi, J A, R W Hansen, and H G Grabowski, “The price of innovation: new estimates of drug development costs.” J. health Econ. 22 (2003): 151-85.) to $1.7 billion (Paul, S M, et al. “How to improve R&D productivity: the pharmaceutical industry's grand challenge.” Nat. Rev. Drug Discov. 9 (2010): 203-14), while the success rate of compounds entering preclinical development remains poor and unchanged at a mere approximately 8% (Id.).
Making good decisions in drug discovery is an enormous challenge and the known approaches often fail to meet drug discovery objectives. Historically, throughout the drug discovery process, compounds are selected for progression for further study on the basis of the data that have been generated up to that point. The criteria by which compounds are selected are generally based on the opinions of experts in the field. In some research, ‘Lipinski's Rule of Five,’ (Lipinski, C. A., F. Lombardo, B. W. Dominy, and P. J. Feeney. “Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings.” Adv. Drug Deliv. Rev. 23 (1997): 3-25) has been applied which are criteria for four basic characteristics of compounds, namely: Number of hydrogen bond donors<5; Number of hydrogen bond acceptors<10; Molecular weight<500; and log P<5.
This ‘manual’ approach to determining the selection criteria for compounds is unsatisfactory and has a number of problems and disadvantages. Criteria set on the basis of expert's opinions are colored by individual biases and are limited to the experience of those experts, so cannot take into consideration large amounts of historical data. This makes it difficult to apply criteria based on broad experience gained across many drug discovery projects. Further, the increasing complexity of the data available in drug discovery makes a ‘manual’ approach to analysis of the data to elucidate selection criteria intractable. For example, this problem is readily apparent in fields such as toxicology, where many early in vitro assays have been developed in an attempt to identify safe compounds and eliminate from consideration compounds that can be toxic to animals or humans. Given the wealth of potential assays that can be applied, it is difficult or not currently possible to identify those that are most important (or even critical) to identifying safe compounds and what selection criteria should be applied to the results from such assays.
Certain computational approaches to predicting outcomes for compounds have been employed in drug discovery. Quantitative Structure Activity Relationship (QSAR) models have been applied to predicting individual properties of compounds, such as solubility, lipophilicity, absorption, metabolism and activity against drug targets. QSAR models have also been developed for predicting certain, in vivo, outcomes. But, these are insufficient to solve the problems outlined above.
Different methods have been used to generate QSAR models, e.g., Partial Least Squares (PLS), Classification and Regression Trees (CART), Random Forest (RF), Support Vector Machines (SVM), Artificial Neural Networks (ANN) and Gaussian Processes (GP). None have solved the challenges that drug researchers face, e.g.: PLS generates linear models which have a low accuracy where there are complex relationships between the descriptors and the outcome being modelled; CART generates models which lack predictive power for complex outcomes and it is difficult to determine the importance of each descriptor in determining the outcome; and non-linear techniques such as SVM, ANN and GP have been tried but fail because the relationships between descriptors used by the techniques and outcomes can not be usefully or reasonably determined.