1. Field of the Invention
The invention generally relates to the automated design of chemical synthesis routes. More specifically, the invention relates to designing chemical synthesis routes using computer-implemented algorithms.
2. Background of the Related Art
Chemical synthesis is the process by which complex chemical compounds are created from simpler ones. Many important drugs and advanced materials are produced utilizing chemical synthesis.
Chemical compounds are made up of atoms of different elements, held together by chemical bonds. Synthesis usually involves breaking existing bonds and forming new bonds using chemical reactions. Synthesis of a complex molecule involves a sequence of reactions leading from the available starting materials to the desired end product. Such a reaction sequence is called a synthesis route.
The design of synthesis routes must consider many factors, such as the availability and cost of starting materials, the energy and time requirements of reactions, and the cost of purifying the end products. Creating synthesis routes is a difficult task for which there is no single design protocol. Chemists who successfully design synthesis routes require experience, intuition, and years of effort.
Genetic algorithms (GAs) are problem-solving algorithms based on the mechanics of natural selection and genetics. The motivation for GAs is the success of biological evolution in solving difficult problems in nature.
GAs operate on populations of individuals representing potential solutions to a problem. Individuals in GA usually encode solutions as fixed-length bit strings (i.e., strings of 0""s and 1""s). GAs solve problems by evolving successively better populations using a survival-of-the-fittest process. The fitness of an individual solution is determined by a problem-specific fitness function.
The initial population typically contains a plurality of randomly-generated bit strings. Subsequent generations of the population are produced by genetic operations that mimic recombination (crossover), mutation, and other biological operations.
Genetic programming (GP) is an extension of the Genetic Algorithm. In GP, individuals are computer programs of varying shapes and sizes. The programs are usually LISP expressions or hierarchical program trees. The fitness of a GP program is determined by first executing it, then evaluating its results using a problem-specific fitness function.
The initial population usually contains randomly-generated but syntactically-correct programs. Subsequent generations of the population are produced by biologically-inspired operations that act on subprograms and preserve syntactic correctness.
GP is widely-applicable since many problems have solutions that can be easily encoded as computer programs. GP has already been used to produce human-competitive solutions to difficult problems such as electronic circuit design.
Computer-aided chemical synthesis programs help chemists design synthesis routes. Such programs are often consulted by practicing chemists when planning syntheses of complex molecules. The general field concerning computer-aided chemical synthesis programs is typically known as Computational Chemistry and includes the field of Computer-Aided Organic Synthesis (CAOS).
The presently available solutions and most other computer-aided synthesis programs operate retrosynthetically (i.e., backwards from the target molecule to the starting material). The program user supplies the target molecule, and then the programs output a series of possible precursor molecules for forming the target molecule. Repeating this process results in the growth of a tree of possible routes, leading from the target back to more accessible starting materials.
Some of the presently available synthesis techniques do not automatically generate synthesis routes. Rather, they are interactive with the user, only helping guide the selection of promising routes. Other techniques can be used to generate retrosynthetic routes without human interaction, but often produce backwards transformations that do not correspond to real chemical reactions. Further, all of the previously developed programs depend on empirical databases, data tables, reaction matrices, etc., listing all possible synthetic transformations. This limits their predictions to known transformations stored in their databases. In one example of the use of genetic algorithms in chemistry, U.S. Pat. No. 5,434,796 describes an encoding technique that allows cyclical chemical graphs to be represented by bit strings in a genetic algorithm.
All of the above methods evolve molecules only, and they do not address the problem of how to create synthesis routes for evolved molecules. Presently, there is no successful application of genetic algorithms or genetic programming to the problem of inventing chemical synthesis routes. Therefore, there is a need for a method and apparatus for the automated design of chemical synthesis routes utilizing genetic algorithms and/or genetic programming for inventing chemical synthesis routes that satisfy prespecified design goals.
The present invention provides a method and apparatus for the automated design of chemical synthesis routes that satisfy prespecified design goals. More specifically, the present invention includes a method and apparatus for running an iterative process applied to a population of individuals that encode chemical synthesis routes.
The present invention includes a method for determining the outcome of a chemical reaction, a method for determining the structural similarity of two molecules, and a method for evaluating the properties of a chemical synthesis route.
The method for designing a synthesis route for a target molecule comprises: generating a plurality of individuals, wherein each individual encodes a synthesis route; decoding each individual to produce a synthesis route comprising at least one reactant molecule and at least one reaction; and determining whether the synthesis route satisfies a design goal.
The invention also provides a computer readable medium containing instructions for a computer program executable by the computer to perform a method for designing a synthesis route for a target molecule.
Another aspect of the invention provides an apparatus comprising a parallel computer system for executing instructions of a computer program to perform a method for designing a synthesis route for a target molecule.