Nowadays, drug discovery proceeds through the stages of a searching for a target, searching for hit compounds, searching for a lead compound, and synthesis of drug candidates to a drug development process of preclinical trials. In the search for a target, a causative gene (protein or the like) of an illness is identified by sequence information analysis (database research) or analysis of gene expression information with microarrays or the like. In the search for hit compounds, docking and MD (simulation) on a computer are performed to narrow down candidates, and structures and functions are predicted based on similar proteins, in order to search for compounds that presumably act on the causative gene. In the search for a lead compound, a search is performed for compounds that are similar to the hit compounds to find a lead compound that is more effective. When synthesizing drug candidates, verification experiments are performed with respect to compounds in the vicinity of the lead compound.
In searching for a lead compound, it is necessary to search in a database of known compounds for a compound that is similar to a hit compound. The owners of databases of known compound include public institutions such as the National Center for Biotechnology Information (NCBI) and supplier enterprises that synthesize and sell compounds.
In current drug discovery, a researcher does not want to disclose data of a hit compound that the researcher discovered to outside sources. This tendency is particularly noticeable in business enterprises. When performing a search in a public compound database, a researcher downloads all the data onto the computer of the researcher's company. However, since a search space for organic compounds has a size of 10 to the power of 60, the data amount will possibly increase in the future and it will be difficult to download and hold all the data. Further, when a researcher wants to search for compounds in the compound database of a business enterprise, a researcher purchases the data after entering a confidentiality agreement with the enterprise. However, because it is not possible for the researcher to know in advance whether or not the purchased database includes compounds that are similar to a hit compound, the purchase of the database may unfortunately be a wasted investment.
On the other hand, enterprises that provide compound databases adopt a selling model in which the enterprises make databases that have a low level of confidentiality publicly available for free, and sell a so-called “focused library” that has a high level of confidentiality. A focused library is a database that is a collection of useful compounds that the enterprise spends resources on to prepare, and in which it is expected that hits are especially liable to occur. Generally, a focused library accumulates compounds which are considered to be highly effective with respect to a specific drug discovery target (for example, GPCR or kinase or the like). However, if the enterprise does not disclose any of the information in a focused library, researchers that are concerned with wasteful investment will have a negative attitude with regard to purchasing the data, and the enterprise will risk losing business opportunities.
Therefore, if a researcher can know whether or not compounds similar to a compound that the researcher has in hand is present in a database while the researcher keeps the compound that the researcher has discovered secret as well as the database provider keeps the contents of the database secret, this will be advantageous to both the researcher and the database provider.
Heretofore, almost no research has been conducted with regard to methods for determining the similarity of respective compounds while maintaining confidentiality. As far as the present inventors are aware, the only prior research performed in this respect is disclosed in Non Patent Literature 1 that relates to securely calculating the similarity (Tanimoto coefficient) of respective compounds. According to the method proposed in Non Patent Literature 1, a process is repeated in which a researcher and a database owner respectively encrypt elements of a vector that represents a compound and send the encrypted elements to a third party, and the third party determines whether the respective elements from the researcher and the database owner match, and the process is repeated.