1. Field of the Invention
This invention pertains generally to bioinformatics and to knowledge-based expert systems, and more particularly to the prediction of enzyme-catalyzed chemical reactions.
2. Description of Related Art
Although our knowledge of metabolism is extensive, it is still incomplete. Many metabolites present in cells, and their reactions and associated enzymes, remain to be discovered. Systematic and rapid approaches for determining how a newly discovered metabolite is produced and consumed are lacking. Thus, a present challenge for bioinformatics in support of metabolomics is the development of computational methods that will help place novel metabolites in the context of biochemical pathways efficiently and accurately.
The prediction of metabolic reactions has been a topic of interest for some time, in part because the metabolic breakdown products of a drug can sometimes be toxic. Bioremediation is another area in which predictive understanding of metabolic breakdown products is important. Because enzyme-catalyzed reactions are generally too complex to be predicted from first principles, a variety of knowledge-based expert systems have been developed for predicting metabolic transformations, and several software tools are available for the prediction of biodegradation pathways, such as MetabolExpert, METEOR, and META. These systems are based on expert-defined rules for biotransformations that generalize the transformations of known reactions. For example, a recently developed system, BNICE, is based on around 250 enzyme-reaction rules derived from the Enzyme Commission (EC) classification system. The rules are essentially generators of chemical reactions, and they are applied to enumerate the reactions in which a molecule of interest may potentially participate. The number of rules varies from system to system [cf., META, which is based on 1118 rules]. The number of rules reflects the level of abstraction judged to be appropriate, the intended scope of application, and the data considered when formulating the rules.
A common problem of systems for rule-based reaction prediction is the number of false positives. Given a molecule, rule application tends to generate a large number of possible reactions, and a combinatorial explosion occurs if rules are applied iteratively to the products of predicted reactions. To mitigate this problem, researchers have attempted to use expert knowledge to limit the generality of rules. Some systems provide priority indices or other filters based on expert knowledge to aid a user in assessing the likelihood of a reaction.
Our knowledge of metabolism is far from complete, and the gaps in our knowledge are being revealed by metabolomic detection of small-molecules not previously known to exist in cells. An important challenge is to determine the reactions in which these compounds participate, which can lead to the identification of gene products responsible for novel metabolic pathways. To address this challenge, the present invention uses machine learning to predict potential substrates and products of enzyme-catalyzed reactions.
To find new therapeutics, drug developers are scanning many compounds. Failure of a drug candidate in a late testing phase is very costly, so a “fail-fast, fail-cheap” approach has been widely adopted, in which various properties, including metabolism, are studied at the early stage of drug development. In silico tools enable fast and virtual screening of large numbers of compounds, so that poor candidates can be eliminated up front. Actual assays are then performed from the short-listed pool of compounds increasing the success rate of finding a good candidate with desirable characteristics.