The present invention relates to a system that predicts the binding of polymer molecule which is applied to a drug discovery simulation system as well as mass analysis that measures the interaction between biopolymers or an analysis system of protein chip. In order to increase the simulation computing speed and the analyzing speed of the above type, a parallel and distributed environment using a parallel and distributed computer system is effective, and the present invention is useful in a single computer.
In the post-genome age, there has been developed a compound molecular structure as a drug that promotes or inhibits the vital function of polymer molecule for protein, DNA and RNA which are obtained from genome information. For example, the molecular species, the reactive group, or the skeleton structure of a compound is changed to design the compound molecular structure so as to enhance the binding strength of protein with respect to target protein. In the drug discovery simulation system, a binding energy is computed with respect to a large number of configuration structures of protein and compound. The binding structure having the minimum binding energy is searched to perform the optimum design of the compound molecule structure. If the binding structure having the minimum binding energy of protein and compound can be searched in a high speed, it is possible to reduce a development period of the compound molecular structure with respect to target protein. Also, it is possible to reduce the costs that experimentally produce the compound molecule and measures the binding strength of the compound molecule and protein.
FIGS. 19 to 21 are diagrams showing a general structural example of a parallel and distributed computer system in which drug discovery simulation is performed under the parallel and distributed environments.
FIG. 19 shows a state in which each of computing units 1903 that are units of computing is made up of a computing processor unit 1901 that executes computing, and a memory unit 1902 that stores input data required to execute the computing or output data obtained by execution of the computing. Also, the computing units 1903 each of which is a unit of the computing are bound in a grid pattern by data transfer networks 1904 and 1905 that execute data transfer between the respective computing units 1903. One of the data transfer networks is connected with a personal computer 100 for control. The personal computer 100 executes the input of data to be subjected to simulation, the distribution of computation to the computing units 1903, and the tabulation of the computation results between the respective computing units 1903 and of the respective computing units 1903. Also, the system is provided with, for example, a management unit 1906 of the parallel and distributed computer system at a node between the personal computer 100 and the network. The management unit 1906 collects information on the number of processors NPE which can be operated by the entire system, the type of processor of the connectable computing units, or the speed of the network, and supplies the information to the personal computer 100.
FIG. 20 is a diagram showing a structural example in which the computing unit is formed of a cluster of personal computers, or a cluster 2001 of workstations, and structured by a data transfer network that is connected in a ring configuration by insernets 2002 that execute data transfer between the respective machines. Similarly, the insernet 2002 of the data transfer network is connected with the personal computer 100 for control. In this example, the personal computer 100 for control is connected to the insernet 2002 of the data transfer network. In this example, similarly, there is provided a management unit 2003 of the parallel and distributed computer system.
FIG. 21 is a diagram showing a structural example of a wide-area distributed environment in which the computing unit is made up of plural grid machines 2101 each having a parallel computer 2103, a cluster of personal computers, and a cluster 2001 of work stations connected to each other on a high-speed network 2002 that executes high-speed data transfer, and the respective grid machines 2101 are connected to each other in the ring manner on a high-speed network 2102. Similarly, in this case, the personal computer 100 for control is connected to a high-speed network 2102 of the data transfer network, and a management unit 2003 of a parallel and distributed computer system is disposed.
FIG. 22 is a conceptual diagram showing an example of a transacting function of a related system that performs simulation which predicts a binding structure of polymer molecule on the parallel and distributed computer system of the above type. Reference numeral 100 corresponds to the personal computer of the parallel and distributed computer system which is described with reference to FIGS. 19 to 21. The personal computer 100 includes an input system 101 for inputting simulation data by a user, a transaction and control unit 102 for executing a transaction for distributing the computation of a binding energy to the respective computing units 104 from the inputted data, and for executing a transaction for distributing the results of the computation of the binding energy to the respective computing units to integrate the computing results, and an output device 103 for displaying the operation status of the transaction and control unit 102 or the integrated data. Reference numeral 104 denotes the respective computing units of the parallel and distributed computer system that is described with reference to FIGS. 19 to 21, performs the calculation according to information given by the personal computer 100 for control, and reports the computation results to the personal computer 100. Also, reference numeral 105 denotes the management unit of the parallel and distributed computer system which is described with reference to FIGS. 19 to 21. The personal computers 100, the computing units 104, and the management unit 105 of the parallel and distributed computer system are indicated by dashed lines, and connected by heavy lines in the sense that the personal computers 100, the computing units 104, and the management unit 105 are capable of mutually interchanging necessary data with each other. Also, the association between the transaction step of the transaction and control unit 102 and the transaction step of the computing units 104 are connected by thin solid lines.
The user inputs search regions with respect to the binding structure of protein which is obtained from genome information and polymer molecule to be subjected to simulation such as DNA or RNA, for example, protein and compound in water molecule, as well as the number of decomposed regions which decompose the search region (Step 2211). Also, the user inputs the number of operable computing units NPE that distribute the computation of the decomposed regions that decompose the search region (Step 2212). In this example, the number of operable computing units NPE can be obtained by using data that is supplied from the management unit 105 of the parallel and distributed computer system, or can be obtained by designating a number that is smaller than the data from the input system 101 through the user. The output data of the respective computing units is the minimum binding energy of the binding structure of protein in water molecule, and the compound atomic coordinate data of protein and compound in water molecule in the binding structure. The output system 103 outputs the data as image or numeric data that is readily visible by the user, and displays the data on a display system.
A description will be given of a procedure of computing the binding energy and tabulating the computation results by means of the computing units of the parallel and distributed computer system. Reference numeral 2241 is a step of determining the number of decomposed regions that are distributed to the respective computing units by the aid of the number of decomposed regions that decompose the search region, and the number of computing units NPE that share the decomposed regions. Reference numeral 2242 is a step of determining the search points within the decomposed regions at which the binding energy is computed in the decomposed regions that are allocated to the respective computing units. Reference numeral 2243 is a step of communication control, which transmits data of the respective search points that are allocated for calculating the binding energy to the respective computing units that have been determined in Step 2241 and Step 2242. Also, on the contrary, the binding energies at the respective search points which have been calculated in the respective computing units are received. Reference numeral 2244 is a step of determining the minimum value among the local minimum values of the binding energies that have been computed in the respective computing units 104 which have been received in Step 2243, and the local minimum value that has been calculated in all of the computing units. Reference numeral 2245 is a step of determining whether the iterative calculation is executed, or not, on the basis of the convergence of the local minimum value of the binding energy within the decomposed region.
In the case of executing the iterative calculation, control is returned to Step 2242. In the case of completing the iterative calculation, the minimum value of the binding energy of protein and compound in water molecule, and the atomic coordinate data with respect to the binding structure are outputted to the output system 103.
A description will be given of a computing procedure using a Monte Carlo method in Step 2242 that computes the binding energy of protein and compound in the decomposed regions that have been allocated to the respective computing units with reference to FIG. 23. FIG. 23 is a conceptual diagram showing a binding energy in a search region of compound and protein due to translational operation or rotational operation. The binding energy has the complicated configuration of peaks and troughs according to the translational and rotational search regions. In the Monte Carlo method, a new configuration structure due to the translational operation and the rotational operation is formed by using random numbers on the basis of the arrangement position of the compound and protein and the binding energy, and the binding energy is computed. When a binding energy difference of the new arrangement position with respect to an original arrangement position is ΔE, the probability P (ΔE) of transiting to a new configuration structure is obtained on the basis of Expression (1).P(ΔE)=exp(−ΔE/kBT)  (1)where kB is the Boltzmann coefficient, and T is an absolute temperature. Also, the number of searches until the configuration structure transits to the new configuration structure with the probability of Pth is given by Expression (2).
                              F          ⁡                      (                          Δ              ⁢                                                          ⁢              E                        )                          =                              log            ⁡                          (                              1                -                                  P                  th                                            )                                            log            ⁢                          {                              1                -                                  P                  ⁡                                      (                                          Δ                      ⁢                                                                                          ⁢                      E                                        )                                                              }                                                          (        2        )            