1. Field of the Invention
The present invention relates to an information retrieval system, and more particularly to an information retrieval system that can narrow down a retrieval result by use of hierarchical index of a subject to be retrieved and its method, and a storing medium with an information retrieval program stored therein.
2. Description of the Related Art
When retrieving desired information through access to a database, a user inputs some retrieval condition so to get the retrieval result. When there exist a lot of retrieval results, he or she further enters an additional retrieval condition, depending on necessity, so to narrow the search.
In the information retrieval by the conventional information retrieval system, since any retrieval condition is entered by a user, he or she would repeat trial and error while adding a retrieval condition or deleting it, until getting a properly-narrowed retrieval result.
Then, some techniques for supporting decision of a retrieval condition in the information retrieval have been proposed so far. As this kind of technique, there are a technique of setting a key word index to each information that is disclosed in the Japanese Patent Publication Laid-Open (Kokai) No. Heisei 7-65020, and an information retrieval system of presenting retrieval conditions for narrowing a search by a hierarchical index, to a user, that is disclosed in the Japanese Patent Publication Laid-Open (Kokai) No. Heisei 4-114277.
These conventional techniques for supporting decision of a retrieval condition, however, are only to present an index means established based on the information stored in a database that is a subject to be retrieved, and they have no means for presenting retrieval conditions one after another from the most efficient condition for narrowing a retrieval result according to the retrieval probability of information and a hierarchical index.
As mentioned above, the conventional information retrieval system has no means for supporting decision of a retrieval condition to be added for narrowing a retrieval result, or, if it has, the conventional system cannot present retrieval conditions one after another from the most efficient condition for narrowing a retrieval result based on the retrieval probability of the information.
Therefore, when there is no means for supporting decision of a retrieval condition, a user has to select a proper retrieval condition while repeating trial and error, and even when there is the means for supporting decision of a retrieval condition, the conventional system cannot decide a proper retrieval condition efficiently.
In order to solve the above mentioned problem, the present invention aims to provide an information retrieval system capable of deciding a proper retrieval condition efficiently and narrowing down a search into a proper result by presenting nodes of hierarchical index in the most effective order for narrowing down a retrieval result, and provide its method and a storing medium with such an information retrieval program stored therein.
According to the first aspect of the invention, an information retrieval system for retrieving desired data from a retrieved subject database with data to be retrieved stored therein, comprises
retrieval condition expression creating means for creating a retrieval condition expression by use of an entered retrieval condition,
retrieval executing means for executing retrieval processing through access to the retrieved subject database based on a retrieval condition expression created by said retrieval condition expression creating means,
index storing means for storing each hierarchical index as for retrieved subject data stored in the retrieved subject database,
retrieved subject retrieval probability storing means for storing record of retrieval times of each retrieved subject data stored in the retrieved subject database, and
narrowing means of obtaining the index corresponding to the retrieved subject data obtained by said retrieval executing means as a retrieval result, from said index storing means, computing expected value of acquired information based on the retrieval times of the retrieval result data stored in said retrieved subject retrieval probability storing means, and presenting to a user the index information corresponding to the retrieval results, in decreasing order of the expected value of the computed acquired information as for the retrieved subject data that is the retrieval result, for narrowing down a retrieval result obtained by said retrieval executing means, in reply to a user""s narrowing request.
In the preferred construction, the information retrieval system further comprises retrieval condition receiving means for receiving a user""s input of a retrieval condition, and retrieval result display means for displaying a retrieval result obtained from the retrieval processing by said retrieval executing means.
In another preferred construction, said narrowing means obtains, from the hierarchical index stored in said index storing means, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained by said retrieval executing means as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, computes the expected value of the acquired information as for the respective obtained question nodes, and presents to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes, and further
said narrowing means computes information amount M(C) of collection C of the retrieved subject data by the following expression, assuming that the collection of the retrieved subject data that is a retrieval result by said retrieval executing means is defined as C, by use of the number k of the retrieved subject data that is the retrieval result and the existence ratio Pj of the retrieved subject data j of the collection C of the retrieved subject data,       M    ⁡          (      C      )        =      -                  ∑                  j          =          1                k            ⁢                        P          j                ⁢                  log          2                ⁢                  P          j                    
computes the expected information amount B (C, a) when the collection C is divided into partial collections C1, . . . , Cn based on the answer nodes a1, . . . , an as for the given question node a, by the following expression,       B    ⁡          (              C        ,        a            )        =            ∑              i        =        1            n        ⁢                            |                      C            i                    |                          |          C          |                    ⁢              M        ⁡                  (                      C            i                    )                    
and computes the expected value gain (C, a) of the acquired information of the respective question nodes, by the following expression.
gain(C,a)=M(C)xe2x88x92B(C,a)
In another preferred construction, said narrowing means makes the existence ratio Pj of the retrieved subject data j in the collection C, as the retrieval probability pj of the retrieved subject data rj that is computed by the following expression, by use of the number k of the retrieved subject data that is a retrieval result by said retrieval executing means, the retrieved subject data r1, r2, . . . , rm that is the retrieval result, the retrieval times hj of the given retrieved subject data rj (1xe2x89xa6jxe2x89xa6m), and the number of times vj the retrieval result appears at a lower hierarchy below a different answer node.       p    j    =            h      j                      v        j            ⁢                        ∑                      i            =            1                    k                ⁢                  h          j                    
In another preferred construction, said narrowing means obtains, from the hierarchical index stored in said index storing means, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained by said retrieval executing means as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, computes the expected value of the acquired information as for the respective obtained question nodes, presents to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes, and according to a narrowing instruction entered by a user, creates a partial retrieval condition for retrieving only the lower hierarchy below the answer node selected by the narrowing instruction, in the hierarchical index, so to hand the same condition to said retrieval condition expression creating means, while
said retrieval condition expression creating means converts the retrieval condition into an expression processable by said retrieval executing means by use of the partial retrieval condition expression when receiving the partial retrieval condition expression from said narrowing means, in addition to creation of a retrieval condition expression based on the retrieval condition entered by said retrieval condition receiving means.
In another preferred construction, said narrowing means, according to a narrowing instruction entered by a user, creates a partial retrieval condition for retrieving only the lower hierarchy below the answer node selected by the narrowing instruction, in the hierarchical index, so to hand the same condition to said retrieval condition expression creating means, while
said retrieval condition expression creating means converts the retrieval condition into an expression processable by said retrieval executing means by use of the partial retrieval condition expression when receiving the partial retrieval condition expression from said narrowing means, in addition to creation of a retrieval condition expression based on the retrieval condition entered by said retrieval condition receiving means.
In another preferred construction, said narrowing means makes the existence ratio Pj of the retrieved subject data j in the collection C as the retrieval probability pj of the retrieved subject data rj that is computed by the following expression, by use of the number k of the retrieved subject data that is a retrieval result by said retrieval executing means, the retrieved subject data r1, r2, . . . , rm that is the retrieval result, the retrieval times hj of the given retrieved subject data rj (1xe2x89xa6jxe2x89xa6m), and the number of times vj the retrieval result appears at a lower hierarchy below a different answer node,       p    j    =            h      j                      v        j            ⁢                        ∑                      i            =            1                    k                ⁢                  h          j                    
and, according to a narrowing instruction entered by a user, creates a partial retrieval condition for retrieving only the lower hierarchy below the answer node selected by the narrowing instruction, in the hierarchical index, so to hand the same condition to said retrieval condition expression creating means, while
said retrieval condition expression creating means converts the retrieval condition into an expression processable by said retrieval executing means by use of the partial retrieval condition expression when receiving the partial retrieval condition expression from said narrowing means, in addition to creation of a retrieval condition expression based on the retrieval condition entered by said retrieval condition receiving means.
According to the second aspect of the invention, an information retrieval method for retrieving desired data from a retrieved subject database with data to be retrieved stored therein, comprising the following steps of:
performing retrieval processing in an arbitrary retrieval method,
obtaining the index information corresponding to the retrieved subject data obtained as a retrieval result, from the hierarchical indexes as for the retrieved subject data stored in said retrieved subject database,
obtaining the number of retrieval times of the retrieved subject data obtained as the retrieval result, of the record of the retrieval times of each retrieved subject data stored in said retrieved subject database,
computing expected value of acquired information based on the obtained retrieval times,
presenting to a user the index information corresponding to the retrieval results, in decreasing order of the expected value of the computed acquired information as for the retrieved subject data that is the retrieval result, and
narrowing down the retrieval result in reply to a user""s narrowing request.
In the preferred construction, said narrowing step includes a step of obtaining, from the hierarchical index, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, a step of computing the expected value of the acquired information as for the respective obtained question nodes, and a step of presenting to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes.
In another preferred construction, said narrowing step includes a step of obtaining, from the hierarchical index, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, a step of computing the expected value of the acquired information as for the respective obtained question nodes, and a step of presenting to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes, and
said step of computing the expected value of the acquired information includes a step of computing information amount M(C) of collection C of the retrieved subject data by the following expression, assuming that the collection of the retrieved subject data that is a retrieval result is defined as C, by use of the number k of the retrieved subject data that is the retrieval result and the existence ratio Pj of the retrieved subject data j of the collection C of the retrieved subject data,       M    ⁡          (      C      )        =      -                  ∑                  j          =          1                k            ⁢                        P          j                ⁢                  log          2                ⁢                  P          j                    
a step of computing the expected information amount B (C, a) when the collection C is divided into partial collections C1, . . . , Cn based on the answer nodes a1, . . . , an as for the given question node a, by the following expression,       B    ⁡          (              C        ,        a            )        =            ∑              i        =        1            n        ⁢                            |                      C            i                    |                          |          C          |                    ⁢              M        ⁡                  (                      C            i                    )                    
and a step of computing the expected value gain (C, a) of the acquired information of the respective question nodes, by the following expression.
gain(C,a)=M(C)xe2x88x92B(C,a)
In another preferred construction, in said step of computing the information amount M(C) of the collection C of the retrieved subject data,
the existence ratio Pj of the retrieved subject data j in the collection C is made as the retrieval probability pj of the retrieved subject data rj that is computed by the following expression, by use of the number k of the retrieved subject data that is a retrieval result by said retrieval executing means, the retrieved subject data r1, r2, . . . , rm that is the retrieval result, the retrieval times hj of the given retrieved subject data rj (1xe2x89xa6jxe2x89xa6m), and the number of times vj the retrieval result appears at a lower hierarchy below a different answer node.       p    j    =            h      j                      v        j            ⁢                        ∑                      i            =            1                    k                ⁢                  h          j                    
According to the third aspect of the invention, a computer readable memory storing an information retrieval program for retrieving desired data from a retrieved subject database with data to be retrieved stored therein while controlling a computer system, in which
said information retrieval program includes the following steps of:
performing retrieval processing in an arbitrary retrieval method,
obtaining the index information corresponding to the retrieved subject data obtained as a retrieval result, from the hierarchical indexes as for the retrieved subject data stored in said retrieved subject database,
obtaining the number of retrieval times of the retrieved subject data obtained as the retrieval result, of the record of the retrieval times of each retrieved subject data stored in said retrieved subject database,
computing expected value of acquired information based on the obtained retrieval times,
presenting to a user the index information corresponding to the retrieval results, in decreasing order of the expected value of the computed acquired information as for the retrieved subject data that is the retrieval result, and
narrowing down the retrieval result in reply to a user""s narrowing request.
According to another aspect of the invention, an information retrieval system for retrieving desired data from a retrieved subject database with data to be retrieved stored therein based on a retrieval condition expression, comprises
retrieved subject retrieval probability storing means for storing record of retrieval times of each retrieved subject data stored in the retrieved subject database, and
narrowing means of obtaining the index corresponding to the retrieved subject data obtained as a retrieval result, from said index storing means storing hierarchical indexes as for the retrieved subject data stored in said retrieved subject database, computing expected value of acquired information based on the retrieval times of the retrieval result data stored in said retrieved subject retrieval probability storing means, and presenting to a user the index information corresponding to the retrieval result, in decreasing order of the expected value of the computed acquired information as for the retrieved subject data that is the retrieval result, for narrowing down a retrieval result obtained by said retrieval executing means, in reply to a user""s narrowing request.
In the preferred construction, said narrowing means obtains, from the hierarchical index stored in said index storing means, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, computes the expected value of the acquired information as for the respective obtained question nodes, and presents to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes.
In another preferred construction, said narrowing means obtains, from the hierarchical index stored in said index storing means, a question node that is a node positioned at the lowest hierarchy, of the common nodes in all the paths, and an answer node that is a node positioned at a lower hierarchy right beneath the question node, assuming a path starting from a leaf indexed to the retrieved subject data obtained as a retrieval result and tracing the nodes of hierarchical index up to the upper hierarchy, computes the expected value of the acquired information as for the respective obtained question nodes, and presents to a user the question nodes in decreasing order of the expected value of the computed acquired information and the answer nodes corresponding to the respective question nodes, and further
said narrowing means computes information amount M(C) of collection C of the retrieved subject data by the following expression, assuming that the collection of the retrieved subject data that is a retrieval result is defined as C, by use of the number k of the retrieved subject data that is the retrieval result and the existence ratio Pj of the retrieved subject data j of the collection C of the retrieved subject data,       M    ⁡          (      C      )        =      -                  ∑                  j          =          1                k            ⁢                        P          j                ⁢                  log          2                ⁢                  P          j                    
computes the expected information amount B (C, a) when the collection C is divided into partial collections C1, . . . , Cn based on the answer nodes a1, . . . , an as for the given question node a, by the following expression,       B    ⁡          (              C        ,        a            )        =            ∑              i        =        1            n        ⁢                            |                      C            i                    |                          |          C          |                    ⁢              M        ⁡                  (                      C            i                    )                    
and computes the expected value gain (C, a) of the acquired information of the respective question nodes, by the following expression.
gain(C,a)=M(C)xe2x88x92B(C,a)
In another preferred construction, said narrowing means makes the existence ratio Pj of the retrieved subject data j in the collection C, as the retrieval probability pj of the retrieved subject data rj that is computed by the following expression, by use of the number k of the retrieved subject data that is a retrieval result, the retrieved subject data r1, r2, . . . , rm that is the retrieval result, the retrieval times hj of the given retrieved subject data rj (1xe2x89xa6jxe2x89xa6m), and the number of times vj the retrieval result appears at a lower hierarchy below a different answer node.       p    j    =            h      j                      v        j            ⁢                        ∑                      i            =            1                    k                ⁢                  h          j                    
Other objects, features and advantages of the present invention will become clear from the detailed description given herebelow.