In the body of a living creature, i.e., in vivo, many chemical substances such as genes, proteins, lipids, and acids are present. These chemical substances are each present as a molecule and affect one another. The mutual influence between molecules is referred to as “molecular interaction”.
Because countless molecules are present in vivo, many molecular interactions naturally occur. A molecular interaction does not occur independently and a sequence of molecular interactions often occurs. For example, “molecule A affects molecule B and as a result, molecule B forms molecule C, i.e., the molecular interactions are linked to one another like a string of beads, starting from molecule A to molecule B and, then, to the molecule C. A group of molecular interactions linked in such a manner is referred to as a “pathway”.
FIG. 29 is a diagram for explaining pathways. A pathway is useful in understanding biological processes. For example, “molecule C is deformed by a molecular interaction of molecule A and molecule B and as a result, the deformation of the molecule C causes a particular disorder”; “the structure of molecule C is maintained by a molecular interaction of molecule A and molecule B and as a result, normal organ function is continued”.
As described in the above examples, an overview of biological processes, regardless of normal or abnormal function, becomes understandable through pathways of molecular interaction sequences. Therefore, constructing a pathway is important in biological science related fields such as in medical services and pharmaceutical development. There are a number of pathway construction methods.
“Curation” is one method. “Curation” is a method of constructing a pathway, where a specialist called “curator” reads published literature, extracts portions that describe molecular interactions, and combines the molecular interactions to construct the pathway.
Because the curation is a method of constructing a pathway based on a human resource of curators, the amount of published literature to be read relates directly to work load. “PubMed” (see, e.g., Pubmed, on the Internet) is a website that discloses a database of published literature.
For reference, “KEGG” (see, e.g., KEGG: Kyoto Encyclopedia of Genes and Genomes, on the Internet), “BioCartal”(see, e.g., BioCarta, on the Internet), etc., are websites that disclose databases of pathways constructed by curation.
Data mining and text mining by mechanical processing are other examples of pathway construction. “Data mining” is a generic name for knowledge finding approaches involving finding hidden relations and meanings by analyzing a large amount of data using various statistical analysis approaches. In particular, obtaining specific findings and ideas by dividing text data (ordinary, natural sentences) into words, etc., and analyzing the appearance frequencies of the words and correlations therebetween is referred to as “text mining”.
The methods employed for specialized text mining in biotechnology include a method of constructing a pathway, where mechanical syntax analysis is executed on “molecules” that cause molecular interaction, an “action” that each of the molecules exerts, etc. and that are included in published literature and thereby, interactions are extracted to construct a pathway. By combining the text mining and the data mining, pathways that are meaningful in terms of life sciences, e.g., “a pathway related to colon cancer” may be constructed by a computer.
“MeSH terms” (see, e.g., MeSH, on the Internet) are biological and medical terms used in biological data mining. “MeSH” stands for Medical Subject Headings and refers to a group of biological and medical terms. MeSH are already given to published literature and by calculating the total amount of the MeSH terms, it becomes possible to analyze the significance a particular group of published literature has biologically and medically.
Further, website disclose a database formed by correlating biological and medical significance to molecules constituting each pathway. OMIM (see, e.g., OMIM, on the Internet) and H-invDB (see, e.g., H-inv DB, on the Internet) each correspond to such a website. Both databases are formed by correlating genetic significance to the molecules. The biological and medical significance of a molecule, a gene, etc. may be identified by using the data in each of these databases in data mining.
BMC Bioinformatics 2005, 6 Suppl 1 S4. Epub 2005 May 24 is a reference concerning text mining specialized for biology and medicine. “BioCreAtIvE” (see, e.g., BioCreAtIvE, on the Internet) is a research organization. Websites that disclose a database having interaction information preliminarily stored therein include “HPRD” (see, e.g., Human Protein Reference Database, on the Internet) and “BIND” (see, e.g., BIND, on the Internet). These websites have registered therein direct interactions between proteins such as “bonding”. Information on the molecular interactions registered therein may be collectively obtained and may be used for data mining, etc.
“ResNet” from Ariadne Genomics, Inc., is a commercial database formed by correlating “types” and “functions” of molecules as the significance of a molecular interaction and the molecules that cause the molecular interaction. Such a database may be purchased and data mining may be executed using the database.
“MedTAK”, from Celestare Lexico-Sciences, Inc., is software that has added functions of text mining and data mining. This software analyzes the appearance frequencies, etc., of “a group of molecules”, “MeSH terms”, etc., described in a group of published literature and thereby, supports the extraction of biological and medical meanings thereof. However, the text mining technique of the software has no function of extracting molecular interactions.
When a molecular network is constructed using a pathway, “a method of selecting a route that has biological and medical significance” is necessary. Due to this point of finding “biological and medical significance”, the circumstances of the selection of a route for the pathway is different from that of ordinary route selection in a network. There are a number of route selecting approaches for a pathway.
For example, Japanese Laid-Open Patent Publication No. 2006-146380 introduces a conventional technique of giving “biological and medical” information concerning, for example, a disorder, to a molecular interaction. However, as to selection of a route between molecules, the shortest route is always selected according to this approach.
International Publication Pamphlet No. WO2003-077159 introduces a method of using a set of routes among two or more molecules called “subnet” as an approach of selecting a route that is selected taking into account the degree of relation to a disorder. Subnets each concern a disorder, etc., and are constructed in advance. When a route is sought, if a subnet concerning a disorder is hit, the selection of a route related to the disorder is enabled.
Japanese Laid-Open Patent Publication No. 2005-122231 discloses a method of displaying a screen to construct a network of terms such as compounds concerning a gene, names of disorders, and proteins. This method is an approach where a user designates a term group 1 and a term group 2 to depict, as a network, information from published literature that suggest relations among the terms.
However, due to shape characteristics of the network, a problem arises for apathway in terms of “route selection”, i.e., because the number of references registered in “PubMed” is tremendous, the number of molecular interactions extracted is also tremendous. Because the number of molecular interactions, which are components of a pathway, is tremendous, a problem arises in that the shape of the pathway naturally forms a more extensive and complicated network and the selection of a route becomes difficult. An extensive network for which selection of a route is difficult is depicted in FIG. 30.
A pathway may be constructed by reducing the number of molecular interactions using curation. However, as of November 2006, the number of references in “PubMed” was at least 16 million and this number increases by 50 to 60 thousand per month. Therefore, a problem arises in that the number of references to be read is tremendous even at present and therefore, construction consumes a tremendous amount of time.
The number of published literature continues to increase and therefore, a problem arises in that “responses to new theories”, etc., using information for updating is limited. Hence, a problem has arisen in curation whereby a biased pathway may be constructed based on subjectivity on the part of the curator.
In Japanese Laid-Open Patent Publication No. 2006-146380, when multiple routes are present that each have a long route length and multiple biological and medical meanings, examination is impossible and the technique is insufficient for clarification of the mechanism of a disorder, etc. Multiple routes often occur on a pathway and no correlation between the shortness of the route length and the biological and medical meanings has been confirmed.
Research on the mechanism of an in vivo molecule is ongoing and therefore, omissions concerning new biological and medical information occur in a subnet constructed in advance. However, in International Publication Pamphlet No. WO2003-077159, no alternative means is presented. International Publication Pamphlet No. WO2003-077159 includes no description on any approach of using a computer, etc. to construct a subnet that is related to disorders and that may be updated continuously. Therefore, the approach therein is insufficient.
In Japanese Laid-Open Patent Publication No. 2005-122231, no approach is presented of displaying, with respect to each molecule and each sequence of molecules, the relation with the biological and medical meanings that indicate a specific disorder. Therefore, the method described in Japanese Laid-Open Patent Publication No. 2005-122231 is insufficient. Databases that give medical and biological meanings to each of the genes stored therein such as OMIM and H-invDB, and HPRD, BIND, etc. that have information on interactions stored therein are effective as materials for data mining, but themselves have no function of pathway construction. Therefore, examination of a biological phenomenon by way of a pathway is not possible and naturally, these databases do not contribute to the selection of a route.
MedTAK software from Celestare Lexico-Sciences, Inc., extracts medical and biological meanings from a group of published literature using a data mining technique. However, MedTAK extracts no molecular interaction. Therefore, the software is unable to construct a pathway and does not contribute to the selection of a route therefor.
Hence, a problem arises in that for Japanese Laid-Open Patent Publication Nos. 2005-122231 and 2006-146380 as well as International Publication Pamphlet No. WO2003-077159, the databases (OMIM, H-invDB, HPRD, and BIND), and the software (MedTAK) are insufficient in terms of “a method for selecting a route that has biological and medical meanings” for a pathway.