1. Field of the Invention
This invention relates generally to the field of pharmaceutical research and to the three dimensional searching of structures of chemical compounds to identify compounds which may share a biological activity with a known compound. In particular the invention concerns a method for searching databases of commercially available compounds which may or may not share any common synthetic linage.
2. Description of Related Art
The advent of high throughput screening of chemical compounds for biological activity has dramatically changed the paradigm of pharmaceutical research in recent years. Coupled with combinatorial synthesis, it is now possible to test millions of compounds on an efficient basis. However, the cost per hit of such searching remains extremely high given the enormous number of compounds which can be tested and the typically low “hit” rates which are achieved. As a result, greater emphasis has been placed on the testing of compound libraries which are believed to contain a higher percentage of potentially relevant molecules. The skills of computational chemists have been employed to design such compound libraries for testing.
Two type of libraries were considered possible: first, a library which explored the diversity of structures in chemical space across the range of compounds which could be synthesized without oversampling the same area of diversity space (redundant testing); and second, a library in which the compounds would be likely to have the same biological activity as a known molecule or drug. The major problem confronting computational chemists in the selection of compounds for such libraries was how to characterize the compounds in a manner which would permit the desired selections. Bioscientists have long known that the three dimensional shape of a compound which acts as a ligand to a larger biomolecule must be complimentary to the shape of the binding site of the larger biomolecule. In studying the relationships between the chemical structure of a molecule and its biological activity (structure activity relationships [SAR]) many techniques to characterize the three dimensional shape of molecules were devised. One of the most successful of the techniques for generating a quantitative structure activity relationship (QSAR) characterized the shape of molecules by defining an interaction energy field between a probe molecule and each part of the studied molecule in a three dimensional grid surrounding the molecule. The shape data thus generated for a series of molecules could be correlated with the biological activity of the molecules to produce the QSAR. This technique by Cramer and Wold (Comparative Molecular Field Analysis [CoMFA]) is described in detail in U.S. Pat. No. 5,025,388 and U.S. Pat. No. 5,307,287.
Use of the CoMFA approach required detailed considerations of two major factors: 1) the proper alignment of the test molecules; and 2) the conformation or conformations of the molecules which had to be taken into account. In addition, the technique worked only with molecules sharing the same biological activity. However, the technique clearly demonstrated the power of utilizing three dimensional shape descriptors in molecular analysis.
Over time many three dimensional shape descriptors and methods of library selection were attempted by computational chemists. U.S. Pat. No. 5,703,792 to Chapman describes one such approach. Two major problems confronted the field and cast doubt on the generality or accuracy of all the methods which had been devised. The first problem was that no one could show that the molecular structural descriptors which had been used were generally valid; that is, that the descriptors described molecules in a manner which correlated with biological activity across a range of biological systems. Any descriptor which would be used to select compounds for libraries would have to be valid irrespective of the biological activity which might be tested against the library. The second problem was that there was likewise no way to demonstrate that the methods of handling multiple conformations in the prior art methods were either accurate or applicable across all types of molecules.
The solution to these problems by Cramer, Patterson, Clark, and Ferguson are taught in U.S. Pat. No. 6,185,506. The validity of a molecular structural descriptor can be demonstrated across multiple biological activities by employing the Patterson plot methodology described in the patent. Both two and three dimensional descriptors can be evaluated by the methodology, and, in principal, there is no limitation on the dimensionality of the descriptors which can be evaluated. Using the validation technique, valid descriptors were identified which could be used with assurance to design libraries having desired properties. By this method the two dimensional prior art fingerprint Tanimoto descriptor was shown to be valid as well as a new three dimensional descriptor described below. The validation methodology also identified a neighborhood distance characteristic of the descriptors which could be used in the design of the libraries. In addition, the neighborhood distance led directly to methods for searching the libraries, and, once a molecule had shown activity in a screen, for expanding the search for other molecules having the same activity.
Further, a solution to the problem of identifying a generally appropriate molecular conformation or conformations to take into account was taught. An alignment rule for molecular parts (topomeric alignment) is demonstrated which generates a uniform orientation. The shape of the molecular part is characterized, as in CoMFA, by a field of interaction energies calculated between a probe and the atoms in the aligned molecular part at each point in a three dimensional grid surrounding the molecular part. The steric interaction energies are principally used although, in the appropriate circumstances, electrostatic interaction energies may be added. Although the alignment may be arbitrary and unlikely for any particular molecule, the field shape descriptor of the topomeric alignments was shown to be a valid molecular structural descriptor by means of the Patterson plot method.
Using descriptors having an associated neighborhood distance, molecules could be identified which shared shape characteristics in a way which was meaningfully related to their biological activity. The problems of efficient library design and selection of combinatorially accessible molecules could be further addressed. In U.S. patent application Ser. No. 08/903,217, presently allowed, the construction and searching of a virtual library is described. The virtual library contains validated molecular structural descriptions of each component part which could be used in a specified combinatorial synthesis. All possible product molecules which could be combinatorially derived from the component parts can be searched, without the necessity of generating the product structures during the search, for product molecules having desired properties by searching through only a combination of the descriptors of the component parts of the product molecules. In the preferred embodiment the Tanimoto and the three dimensional topomeric CoMFA descriptors are employed.
Due to the combinatorial nature of the number of product molecules whose characteristics can be determined, a relatively small number of structural variations (tens of thousands), cores, and synthetic schemes employing only two attachment points can yield a searchable library of billions of possible molecules according to the method of the patent. Indeed, the number of searchable molecules outnumbers the number of molecules ever reported by several orders of magnitude. By the techniques disclosed in the patent, this virtual library can be searched very fast to construct diverse libraries of molecules likely to share the same biological activity or to find molecules which share the same biological activity as a combinatorially derived query molecule. Further, query molecules which derive from unknown synthetic routes can be fragmented and the molecular descriptor characterization of the fragments used to search for similarly shaped fragments and potential molecules with likely similar biological activity defined in the virtual library. In practice the topomeric field molecular structural descriptor has proven to be very valuable in searching the virtual library. The powerful and fast searching capabilities of the virtual library method have yielded significant advances.
However, the molecules in the virtual library which can be searched by definition derive from a combinatorial assembly of a relatively few number of constituent parts and can be said to be homogeneous in that sense. By virtue of the exceedingly large size of the virtual library, 5 molecules may be identified which are not readily available. Also, although the possible product molecules which can be searched are the result of known combinatorial synthetic schemes, the actual synthesis may not be easily achieved. In the day to day world of pharmaceutical research, large assemblages of available molecules can be commercially obtained. These assemblages are not the result of any particular combinatorial synthesis but rather represent the assembly of a wide range of molecules from many different sources and syntheses, some known, some unknown. Therefore, these assemblages of molecules can be characterized as heterogeneous.
It would be useful if heterogeneous assemblages of available molecules could be searched for molecules which are likely to have a biological activity similar to a known compound before synthesis of new compounds is undertaken with the concomitant additional time and expense.