The present invention relates to a method of searching a three-dimensional structure databases, which can be utilized for finding novel lead-structures of medicaments, pesticides, and other biologically active compounds.
Medicaments generally exhibit their biological activities through strong interactions with target biopolymers. In recent years, three-dimensional structures no- of biopolymers, which play important roles in living bodies, have been revealed successively on the basis of progresses in X-ray crystallographic analyses and NMR technology. With the advance of these researches, methodologies of theoretical approach based on information about three-dimensional structures of biopolymers and their successful results have been reported with respect to lead generations of new drugs, which were conventionally achieved mostly by random screenings or accidental discoveries. Importance of such approach is recognized with increasing interest, and researches from various viewpoints have been conducted in all the world.
An object of these researches is to provide a process for creation of ligand candidate structures by means of a computer. Another object is to provide a process for searching for ligand structures from a three-dimensional structure database containing known compounds. The former (automatic structure construction approach) is advantageous in that the process can suggest a wide variety of possible ligand structures irrespective of known or unknown structures. On the other hand, the latter (database approach) is used in searching for novel biologically active compounds from databases comprised of accumulated structures of known compounds. When a search is applied to a database which consists of compounds stocked in an own company or commercially available compounds, this method is particularly advantageous in that compounds found to reach criteria of hit (xe2x80x9chit compoundsxe2x80x9d) can be obtained without any synthetic efforts, and their association constants to target biopolymers and biological activities can be measured.
Synthesis of a compound with a novel molecular skeleton generally needs several months even for an expert, and accordingly, years of and burden of work are required to synthesize a lot of promising structures and measuring their activities. However, if hit compounds are already available, measurements of association constants and biological activities can be easily performed for dozens of promising ligand candidate compounds. In addition, based on the structure of a compound found to have a considerable extent of high association constant or activity, an efficient lead generation is achievable by designing compounds with higher association constants and biological activities and synthesizing the compounds. For these reasons, approaches have recently been interested which comprises the steps of searching three-dimensional structure databases consisting of known compounds and discovering novel pharmacologically active compounds.
However, one of problems of this method is by what query the search is carried out. Generally, a three-dimensional structure database is searched by queries of whether or not given atomic groups or functional groups exist which are presumed as essential for the desired activity in typical biologically active compounds used as templates, or alternatively, whether or not their relative positions are similar to those of the template compounds. Where the structure of a target biopolymer is unknown, basically no approach can be relied on other than this process. However, since these queries are based on assumptions and hypotheses, hit compounds may often fail to have the same activity as that of the template compounds. Where two or more molecules with quite different structural features exhibit similar activities, it can only become possible to create more probable operating hypotheses, with reference to functional groups essential for the activity and their relative positions, by means of structural information about these molecules.
The most difficult problem in the search of three-dimensional structure databases lies in the handling of conformational flexibility of compounds. It has been known that the conformations of a ligand as it binds to a biopolymer (i.e., the most stable complex is formed) do not necessarily accord with any of stable structures of the molecule itself, in the state of a crystal or a solution, or a structure with the most stable energy obtained by energy calculation, and that one ligand molecule can form stable complexes in different conformations depending on target biopolymers. Generally, a set of atomic coordinates representing a three-dimensional structure of one conformation among multiple possible conformations is contained as for one compound in a database. Furthermore, except for databases derived from crystal structures, information of three-dimensional structure of each compound is obtained from two-dimensionally inputted structure through calculation. These three-dimensional structures often represent one of local minimum structures that can be taken by the molecule, per se.
Therefore, if a search is carried out merely on the basis of conformations contained in a database to judge whether or not compounds meet the criteria of hit, most compounds fail to be selected which should be hit if other conformations are considered. Although the number of conformations to be considered may vary depending on the number of rotatable bonds, as well as factors such as degrees of precise consideration of conformations, several tens to hundreds of thousands conformations should be taken into account for a moderately flexible molecule containing 3 to 6 rotatable bonds. In order to consider these possible conformations, available processes are limited to either of those comprising the steps of selecting promising conformations and inputting them in a database beforehand, or alternatively, generating conformations and examining at the time of conducting a search. In any events, enormous computer resources and calculation time are required.
Recently, Kearsley et al. with Merck prepared a database which comprises 20 conformations at most for a single compound and reported xe2x80x9cFLOG,xe2x80x9d a searching system in which a search is carried out using searching standard mainly consisting of relative positions of functional groups (M. D. Miller, S. K. Kearsley, D. J. Underwood, and R. P. Sheridan: Journal of Computer-Aided Molecular Design, 8, 1994, pp.153-174, FLOG: A system to select xe2x80x98quasi-flexiblexe2x80x99 ligands complementary to a receptor of known three-dimensional structure). It was reported that the FLOG took approximately one week to complete searching a database containing 2,000,000 conformations for 100,000 compounds by means of a super computer CRAY. The twenty conformations in average for each compound are stored by selecting energetically stable local minimum structures through prior conformational analysis of each compound, for which enormous time and burden of work are needed. Nevertheless, twenty conformations per a single compound are insufficient.
In addition, the method adopts the query which comprise the presence or absence of functional groups presumably essential for the activity of a compound as a template, as well as distance, angle, or direction between the groups. These queries are most plain among possible queries, and although they have advantages to shorten calculation time using simple algorithm, high probability cannot be expected that hit compounds actually have the desired activity. The reason lies in that a molecule having inappropriate whole molecular shape and size fails to exhibit activity, even if functional groups merely have desired relative positions. If a three-dimensional structure of the target biopolymer is known, the most effective search can be achieved by means of such information, which provides high probability that hit compounds have the desired activity.
For example, Eyermann et al. with Dupont Merck performed a database search based on relative positions of functional groups of ligand molecules which had been analyzed by X-ray crystallography as a complex with a biopolymer (P. Y. S. Lam, P. K. Jadhav, C. J. Eyemann, C. N. Hodge, Y. Ru, L. T. Bacheler, J. L. Meek, M. J. Otto, M. M. Rayner, Y. N. Wong, C. H. Chang, P. C. Weber, D. A. Jackson, T. R. Sharp, and S. Erickson-Viitanen: Science, 263, 1994, pp.380-384, Rational design of Potent, Bioavailable, Nonpeptide Cyclic Ureas as HIV Protease Inhibitors.). In the HIV protease system, compounds were searched for as to criteria where the three positions of an oxygen atoms in water molecules are in similar relative positions which connect the protease to the ligand through hydrogen bonds at the centers of two benzene rings in the peptide ligand molecule. Among the hit compounds, one compound was selected which well fitted into the ligand-binding region and formed hydrogen bonds other than those formed by the oxygen atoms used as query. They reported that a compound with high activity was developed based on the structure of this compound by repeated works of synthesizing compounds with appropriate modifications while measuring inhibitory activities against the enzyme. However, the publication lacks detailed descriptions as for the search process, and their technique of handling conformations is not deducible.
The search query capable of providing the highest probability that hit compounds have desired activity is that whether or not compounds stably bind to a ligand-binding region of the target biopolymer. Although in some cases the desired activity fails to be exhibited due to physicochemical factors such as water solubility even if a compound satisfies the criteria, reaching the criteria is an essential requirement for the expression of activity. On the other hand, the presence or absence of specific functional groups and the relative positions of functional groups are not necessarily essential for the expression of activity, and even if a compound falls within a different family and has dissimilar skeletal structure to the known active structures, the compound may possibly exhibit the same activity if it can form a complex between the target biopolymer at a similar extent of stability. Therefore, compounds satisfying the criteria that they can stably bind to a ligand-binding region of the target biopolymer have high probability of being actual candidates as ligands. In other words, by using the criteria of fitting to a ligand-binding region of the target biopolymer, it becomes possible to identify compounds which exhibit the desired biological activity and have a wide variety of structures from databases.
In order to determine whether or not a given ligand molecule can bind to the target biopolymer, it is necessary to find the most stable docking structure among possible docking structures and to know the degree of stability of the docking structure. Also, in order to find the most stable docking structure between a given biopolymer and a ligand molecule, it is necessary to estimate stabilities of all docking structures by considering all possible binding modes (corresponding to rotation or translation of one molecule while fixing another compound), and possible ligand conformations. However, since the process needs enormous calculation, it cannot be achieved by interactive methods using a graphic display. Therefore, the development of an automatic docking method has been desired which can perform the above process automatically and efficiently.
As automatic docking methods, Kuntz et al. developed a method for estimating possible binding modes by representing the shape of a ligand-binding region by means of several to dozens of inscribed spheres and comparing a set of vectors formed by the centers of the spheres with a set of intramolecular vectors of a ligand compound (R. L. Desjarlais, R. P. Sheridan, G. L. Seibel, J. S. Dixon, I. D. Kuntz and R. Venkataraghavan: J. Med. Chem. 31 (1989) 722-729). Using Shape Complementarity as an Initial Screen in Designing Ligands for a Receptor-Binding Site of Known 3-Dimensional Structure.). In a recent improved process, the process is modified so that intermolecular energy can be calculated in addition to the score representing coincidence between the vectors.
However, this method is disadvantageous because it needs an undue time even only for covering all binding modes. Some improved processes include alternations so that conformations can be varied for the docking of a single compound. However, the processes are not designed so as to provide an automatic procedure covering all of the docking processes. Although energy evaluation can be done concerning hit compounds, only simple scores can be calculated and hydrogen bonds or the like are disregarded during the docking process. Furthermore, its accuracy is insufficient, i.e., there is a problem in that, for example, the true docking structure determined by crystal analysis fails to be judged as the highest rank and is sometimes judged as rather low in rank. Moreover, its unavoidable defect is in that the process is carried out fundamentally by a fixed conformation for each compound, and conformational flexibilities cannot be handled. In a manual operation cannot compensate for the above defect especially, its practical utility is low in a three-dimensional database search which requires full automatic operation for numerous compounds.
The inventors of the present invention developed xe2x80x9cADAM,xe2x80x9d a method for estimating the most stable docking structures of a biopolymer and a ligand molecule automatically and without any preconception, and they successfully solved all of the aforementioned problems (PCT/JP93/0365, PCT International Publication WO 93/20525; M. Yamada and A. Itai, Chem. Pharm. Bull., 41, p.1200, 1993; M. Yamada and A. Itai, Chem. Pharm. Bull., 41, p.1203, 1993; and M. Yamada Mizutani, N. Tomioka and A. Itai, J. Mol. Biol., 243, pp.310-326, 1994).
In the applications of xe2x80x9cADAMxe2x80x9d to several enzyme-inhibitor systems whose structures had been elucidated by crystallographic analyses, all the resulted docking models with minimum energy perfectly reproduced the structure in the crystal, and torsion angles of rotatable bonds in the ligand molecule and hydrogen-bonding schemes were also accurately reproduced. The high accuracy of xe2x80x9cADAMxe2x80x9d is based on the calculations of structure optimization (energy minimization) performed altogether three times. Though the time required may vary depending on the performance of a computer, numbers of dummy atoms, number of heteroatoms, number of rotatable bonds and the like and cannot be generalized, it takes from several minutes through approximately 1 hour to obtain an initial docking model with an ordinary drug molecule by means of a commonly used workstation (R4400). This speed is dozens of times faster than the fastest method which handles possible conformations reported so far.
However, although ADAM is suitable in searching for the most stable docking structure between the target biopolymer and a single ligand molecule, the method is not suitable for searching novel ligand compounds from a wide variety of numerous compounds contained in a three-dimensional structure database as it is. Therefore, further improvement has been desired.
Accordingly, an object of the present invention is to provide a method suitable for searching novel ligand compounds from a wide variety of numerous compounds, thereby solving the problems of the state of the art.
The inventors of the present invention performed various researches. As a result, they succeeded in developing a process for selecting one or more promising ligand candidates from an enormous number of trial compounds: wherein the process comprises the steps of estimating the most stable docking structure for each trial compound; and selecting one or more ligand candidate compounds from all the trial compounds by given criteria including the interaction energy between the target biopolymer and the trial compound in the most stable docking structures; and the aforementioned step of estimating the most stable docking structure for each trial compound comprises the steps of: evaluating all possible docking structures generated through docking of the trial compound to the biopolymer, while varying conformation of the trial compound and repeating structural optimizations, based on interaction energies between the biopolymer and the trial compound, e.g., hydrogen bonds, electrostatic interaction, and van der Waals force. The present invention was achieved on the basis of these findings.
The present invention thus provides a method of searching one or more ligand compound to a target biopolymer from a three-dimensional structure database, which comprises the steps of:
(i) the first step of assigning hydrogen-bonding category numbers, information for calculating force-field energies, and information for generating conformations, to two or more trial compounds, in addition to three-dimensional atomic coordinates thereof;
(ii) the second step of preparing physicochemical information about a ligand-binding region and one or more dummy atoms based on the three-dimensional atomic coordinates of the target biopolymer;
(iii) the third step of estimating the most stable docking structure, wherein said step further comprises the steps of generating possible docking structures by docking a trial compound to the biopolymer while varying conformations of the trial compound, and evaluating interaction energy between the target biopolymer and the trial compound, based on the hydrogen-bonding category number, the information for calculating force-field energies, and the information for generating conformations assigned to the trial compound in addition to the three-dimensional atomic coordinates according to the step (i) and based on the physicochemical information about the ligand-binding region prepared in the step (ii):and then;
(iv) the fourth step of deciding whether or not the trial compound should be adopted as a ligand-candidate compound based on given criteria including the interaction energy values between the target biopolymer and the trial compound in the most stable docking structure found according to the step (iii); and
(v) the fifth step of repeating the step (iii) and the step (iv) for all of the trial compounds.
The method of the present invention may comprise the sixth step of further selecting the ligand-candidate compounds selected in the step (iv) based on different criteria including the number of hydrogen bonds formed between the target biopolymer and/or interaction energy values derived therefrom.