1. Technical Field of the Invention
The present invention relates to an information retrieval apparatus for retrieving information from a hyper-document system composed of links between nodes, and a medium having an information retrieval program for constructing the information retrieval apparatus in a computer recorded therein, and more particularly to an information retrieval apparatus for retrieving information with node groups having definite meaningful consistence as a retrieval object, and a computer-readable recording medium having the information retrieval program recorded therein.
2. Related Art
A hyper-document system (system described in, for example, HTML (Hyper Text Markup Language)) having no restraint in the meaning in the links between nodes has an advantage that the document author can determine the contents and link structure at will. Also, the document reader can obtain access to a multiplicity of documents prepared by a multiplicity of document readers through the use of a computer network (for example, World Wide Web).
As a related art for supporting the document reader to retrieve his/her desired information from such a hyper-document system, there are the following two ones:
A first related art is a technique in which retrieval indexes for each node are prepared in advance by scanning nodes of as large a quantity as possible (at random) and an index which matches a query (combination of key words) from the document reader is presented (for example, AltaVista, http://altavista.digital.com/). In this respect, as constituent technologies for implementing this technique, a vector space model (G. Salton and J. Allan, Text Retrieval Using the Vector Processing Model, in Proc. of SDAIR94) which is a statistical language processing technique, has been devised in the creation of the retrieval indexes and matching with queries.
A second related art is a technique in which nodes of as large a quantity as possible are scanned (at random) in advance to be allocated to a directory having tree structure which has been classified by topics. The document reader looks for a topic in which the desired information is considered to be contained from the directory to obtain access to the information (for example, Yahoo, http: //www.yahoo.com/) aimed at. In this respect, as constituent technologies to implement this technique, there has been proposed automatic document classification technique (for example, P. Jacobs, Joining Statistics with NLP for Text Categorization, in Proc. of Applied-ACL92) to which the natural language processing has been applied. Further, there has also been devised automatic document classification technique (U.S. Pat. No. 5,526,443, T. Nakayama (Fuji Xerox), Method and apparatus for highlighting and categorizing documents using coded word tokens, issue date: 1996.6.11) in which the media have been expanded into images. Problems to be solved by the Invention
In these two related arts, however, since one node is regarded as one retrieval object unit, the essence of the hyper-document system in which a concept is expressed with a structure consisting of nodes and links cannot be grasped, and the following problems have been pointed out.
The first problem is that although it depends upon the taste of the document author into how many nodes a certain piece of information is divided and into what structure they are built up, node groups built up on a hyper network cannot be grasped on the whole as information having meaningful consistence by such retrieval that nodes are regarded as one unit. In other words, in the retrieval based on the related art, only pieces of information which are imperfect in terms of meaning are to be retrieved, and the context cannot be reflected in the retrieval.
The second problem is that a concept representing a retrieval request cannot be expressed in the structure on the hyper network.
In order to solve these problems, it is necessary to change the retrieval in which nodes are regarded as one unit, and to perform the retrieval in which information having meaningful consistence is regarded as one unit. Such retrieval could be implemented if a feature of a certain starting point node is compared with a feature of an N-order node (N=2, 3, . . .) linked from the starting point node to determine their similarity, and N-order nodes which are determined to be similar are merged with the starting point node. The present applicant has applied for patent (Japanese Published Unexamined Patent Application No. Hei 09-153387) for the invention concerning such an information retrieval apparatus.
This technique enables the document reader to retrieve the desired hyper-structure. In other words, the document reader can acquire useful information by browsing the hyper-structure presented by the retrieval apparatus.
Generally, however, a browsing path has a plurality of branches, and it is not known which links should be transited in order to effectively acquire useful information. For this reason, the document reader actually must depend on trial-and-error methods on selecting those branches while understanding the contents of the nodes which he/she is currently reading. Perusal using such trial-and-error methods is not efficient, but it takes more time than necessary to acquire the desired information.
The present invention has been achieved in the light of the above-described points, and is aimed to provide an information retrieval apparatus capable of effectively perusing useful information within the hyper-text structure retrieved in the retrieval in which information having meaningful consistence is regarded as one unit.
Also, it is another object of the present invention to provide a computer-readable recording medium having an information retrieval program recorded therein, the information retrieval program being capable of causing a computer to execute such a process as to perform the retrieval in which information having meaningful consistence is regarded as one unit, and to allow useful information within the hyper-text structure retrieved to be effectively perused.
As a first information retrieval apparatus according to the present invention for solving the above-described problems, there is provided an information retrieval apparatus for retrieving a hyper-document system composed of links between nodes, which are units of information, comprising: a node group constituting part for constituting node groups consisting of nodes, which are combined through links and are similar in contents, aiming at the nodes in the hyper-document system; a component node storing part for storing component nodes which constitute the node groups; an information retrieval part for retrieving, when a retrieval request is inputted, similar node groups having a high degree of similarity which meet the retrieval request in a plurality of the node groups; a similarity calculation part for calculating degrees of similarity between the component nodes stored in the component node storing part and the retrieval request concerning the similar node groups returned as a candidate as a result of the retrieval by the information retrieval part; and a similarity retrieval result displaying part for displaying paths for accessing each component node in the similar node groups in such a manner that component nodes having a high degree of similarity to the retrieval request can be distinguished.
According to such an information retrieval apparatus, node groups consisting of nodes, which are combined through links and are similar in contents among the nodes in the hyper-document system are constituted by the node group constituting part. Then, component nodes, which constitute node groups, are stored by the component node storing part. Thereafter, when a retrieval request is inputted, similar node groups having high degrees of similarity to the retrieval request among a plurality of node groups are retrieved by the information retrieval part. Next, concerning the similar node groups returned as a candidate as a result of the retrieval by the information retrieval part, degrees of similarity between the component nodes stored in the component node storing part and the retrieval request are calculated by the similarity calculation part. Thus, paths for accessing each component node in the similar node groups are displayed by the similarity retrieval result displaying part in such a manner that component nodes having high degrees of similarity to the retrieval request can be distinguished.
Also, as a second information retrieval apparatus according to the present invention, there is provided an information retrieval apparatus for retrieving a hyper-document system composed of links between nodes, which are units of information, comprising: an inter-node similarity calculation part for calculating inter-node degrees of similarity between nodes which are directly referred to by the links; a retrieval request similarity calculation part for calculating, when a retrieval request is inputted, degrees of similarity in retrieval request between the retrieval request and the nodes contained in the hyper-document system; a link weight calculation part for calculating the link weight between the nodes on the basis of the degree of similarity in retrieval request and the inter-node degrees of similarity; and a dynamic node group constituting part for constituting the node groups connected with one another by the link weight equal to or higher than a threshold set in advance on the basis of the link weight.
According to such an information retrieval apparatus, the inter-node degrees of similarity between the nodes to which are directly referred by the links are first calculated by the inter-node similarity calculation part. Thereafter, when a retrieval request is inputted, the degrees of similarity in retrieval request between the retrieval request and the nodes contained in the hyper-document system are calculated by the retrieval request similarity calculation part. Then, the link weight between the nodes is calculated by the link weight calculation part on the basis of the degree of similarity in retrieval request and the inter-node degree of similarity. Thus, the node groups are constituted by the nodes connected with one another by the link weight equal to or higher than a threshold set in advance on the basis of the link weight by the dynamic node group constituting part.
Also, as a computer-readable recording medium having the first information retrieval program according to the present invention recorded therein, there is provided a computer readable recording medium having an information retrieval program recorded therein, the information retrieval program retrieving a hyper-document system constituted by links between nodes, which are units of information, comprising: a node group constituting part for constituting node groups consisting of nodes, which are combined through links and are similar in contents, aiming at the nodes in the hyper-document system; a component node storing part for storing component nodes which constitute the node groups; an information retrieval part for retrieving, when a retrieval request is inputted, similar node groups having high degrees of similarity which meet the retrieval request among a plurality of the node groups; a similarity calculation part for calculating degrees of similarity between the component nodes stored in the component node storing part and the retrieval request concerning the similar node groups returned by the information retrieval part as a candidate as a result of the retrieval; and a similarity retrieval result displaying part displaying paths for accessing each component node in the similar node groups in such a manner that component nodes having high degrees of similarity to the retrieval request can be distinguished.
If an information retrieval program recorded in such a recording medium is caused to be executed by a computer, necessary functions for the first information retrieval apparatus according to the present invention will be constructed on the computer.
Also, as a computer-readable recording medium having the second information retrieval program according to the present invention recorded therein, there is provided a computer-readable recording medium having an information retrieval program recorded therein, the information retrieval program for retrieving a hyper-document system constituted by links between nodes, which are units of information, comprising: an inter-node similarity calculation part for calculating degrees of similarity between nodes, which are directly referred to by the links; a retrieval request similarity calculation part for calculating, when a retrieval request is inputted, degrees of similarity in retrieval request between the retrieval request and the nodes contained in the hyper-document system; a link weight calculation part for calculating the link weight between the nodes on the basis of the degree of similarity in retrieval request and the inter-node degrees of similarity; and a dynamic node group constituting part for constituting the node groups by nodes connected with one another by the link weight equal to or higher than a threshold set in advance on the basis of the link weight.
If an information retrieval program recorded in such a recording medium is caused to be executed by a computer, necessary functions for the second information retrieval apparatus according to the present invention will be constructed on the computer.