1. Field of the Invention
The present invention relates generally to searching of virtual combinatorial libraries. More particularly, the present invention relates to the selection of compounds, based on fitness functions, from large virtual combinatorial libraries.
2. Related Art
The explosive growth of combinatorial chemistry in recent years has been greeted as both a blessing and a curse. While it has solved the problem of throughput and has allowed the traditionally slow drug discovery process to be conducted in a massively parallelized fashion, it has created the need to deal with compound collections of truly staggering size. These include both physical collections of compounds that are synthesized using automated parallel synthesis, as well as virtual collections containing molecules that could potentially be synthesized by systematic application of established synthetic principles. The initial ambition to ‘make and test them all’ has given way to a more pragmatic approach once it became evident that ‘all’ was a number of immense proportions. For example, a simple diamine-based combinatorial library built from only commercially available reagents can include up to 1012 compounds which is equivalent to approximately 300 years of synthesis and testing at a rate of 10 million compounds per day (Cramer et al., “Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research,” J. Chem. Inf. Comput. Sci. 1998, 38, 1010-1023). The recognition of these practical limitations and the desire to use the available synthetic and screening resources in an efficient way has generated interest in virtual chemistry and, in particular, systems and methods for handling and analyzing large chemical libraries. More specifically, for example, there is an interest in efficiently selecting compounds that are similar to a particular query structure (e.g., drug lead) or selecting compounds that have desired properties.
Searching a virtual combinatorial library for compounds that are similar to a particular query structure (or query structures) or have a set of desired properties typically involves three steps for each compound: enumeration, calculation of descriptors, and evaluation of similarity or estimation of the property of interest. Due to the large number of possible products in many virtual combinatorial libraries (particularly three- and four-component ones), just the enumeration part alone can take a few weeks of computational time. Additionally, the storage requirements for a fully enumerated virtual combinatorial library can be prohibitive. Since in these cases neither the generation nor the storage of fully enumerated libraries and their associated descriptors is feasible, there is a need for systems and methods that can identify the desired compounds without enumerating the entire library.
One possible solution is to look at the far less numerous reagents instead of the products. The reagent-based approach is frequently used to maximize molecular diversity, and is based on the assumption that diverse reagents will lead to diverse products. However, it was recently shown that a selection based on the products themselves can be substantially more diverse, perhaps by as much as 35-50% (Gillet et al., “The Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial Libraries,” J. Chem. Inf. Comput. Sci. 1997, 37, 731-740). When the selection criterion is similarity, the final products themselves must be considered, and the only proposed solution has been to use additive or otherwise “decomposable” descriptors. These are descriptors which, for combinatorial products, can be computed from the values of the corresponding descriptors of their constituent reagents (Cramer et al., “Virtual Compound Libraries: A New Approach to Decision Making in Molecular Discovery Research,” J. Chem. Inf. Comput. Sci. 1998, 38, 1010-1023).
Thus, the need remains for a system and method for efficiently and effectively generating product-based selections from large virtual combinatorial libraries. More generally, there is a need for a system and method for efficiently and effectively searching large virtual combinatorial libraries based on a fitness function.
There do exist some virtual combinatorial libraries that have already been fully or partially enumerated. However, there is currently a deficiency of satisfactory systems and methods for efficiently and effectively searching these enumerated virtual combinatorial libraries based on a fitness function. Accordingly, there is also a need for a system and method for efficiently and effectively searching large enumerated virtual combinatorial libraries based on a fitness function.