(1) Field of the Invention
The present invention relates to a retrieval menu creation device that creates retrieval menus from data according to the contents of the data, a retrieval menu creation method, and a recording medium that stores a retrieval menu creation program.
(2) Description of the Related Art
A large amount of data is now available to ordinary users via networks, for instance, the Internet. "A Method of Clustering Documents Using Classification Patterns" (Information Processing Society of Japan SIG Notes, 97-NL-117-14) introduces a menus retrieval system that creates retrieval menus according to the contents of data so that even a user who is unfamilar with the use of information technology may easily select desired information from the large amount of available data. FIG. 1 is a block diagram that shows the construction of the menu retrieval system.
The menu retrieval system shown in FIG. 1 includes data storage unit 3501, thesaurus storage unit 3502, data feature extraction unit 3503, data relating unit 3504, menu creation unit 3503, and display unit 3506.
Data storage unit 3501 stores the document data that has been obtained via a network and the features that has been extracted by data features extraction unit 3503. FIG. 2A shows an example of the data stored in data storage unit 3501. As shown in FIG. 2A, data storage unit 3501 stores document data 3621, 3622, and 3623 that have been obtained via a network, and features 3601, 3602, and 3603 that have been extracted by data feature extraction unit 3503.
Thesaurus storage unit 3502 stores thesauruses. A thesaurus is a dictionary in which words categorized according to meaning are arranged in a tree. In a thesaurus that thesaurus storage unit 3502 stores, the word with the highest concept is arranged at the root, words with the lowest concept are arranged at the leaves, and words having similar meanings are arranged close to each other.
FIG. 2B shows thesaurus 3511, a thesaurus that thesaurus storage unit 3502 stores. In thesaurus 3511, "means of transport" 3618 is the root, and "car" 3617 and "railway" 3616 are connected to "means of transport" 3618. "Truck" 3611 and "bus" 3612 are connected to "car" 3617, and "steam train" 3613 and "electric train" 3614 are connected to "railway" 3616.
According to thesaurus 3511, "means of transport" 3618 is a higher concept than "car" 3617 and "railway" 3616, "car" 3617 is the conception higher than "truck" 3611 and "bus" 3612, and "railway" 3616 is the conception higher than "steam train" 3613 and "electric train" 3614.
Data feature extraction unit 3503 extracts features of document data that data storage unit 3501 stores. More specifically, data feature extraction unit 3503 extracts words that are often used in the document data as the features. FIG. 2A shows that data feature extraction unit 3503 extracts feature 3601 "bus" as the feature of document data 3621, feature 3602 "truck" as that of document data 3622, and feature 3603 "electric train" as that of document data 3623.
Date relating unit 3504 finds the word that correspond to the feature of document data in a thesaurus. For instance, data relating unit 3504 finds that "bus" 3612 in thesaurus 3511 corresponds to feature 3601 "bus", "truck" 3611 in thesaurus 3511 corresponds to feature 3602 "truck", and "electric train" 3614 in thesaurus 3511 corresponds to feature 3603 "electric train". Accordingly, each of the features is connected to the corresponding word in thesaurus 3511a as shown in FIG. 3.
Menu creation unit 3505 extracts each word whose corresponding feature has been found by data relating unit 3504, and the words that are connected to the words whose corresponding features have been found by data relating unit 3504 from thesaurus 3511a, and creates a menu construction. More specifically, menu creation unit 3505 extracts "truck" 3611 corresponding to feature 3602 "truck", "bus" 3612 corresponding to feature 3601 "bus", "electric train" 3614 corresponding to feature 3603 "electric train", "car" 3617 connected with "truck" 3611, "railway" 3616 connected to "electric train" 3614, and "means of transport" 3618 connected to "car" 3617 from thesaurus 3511a shown in FIG. 3, and creates menu construction 3511b shown in FIG. 4A.
Display unit 3506 displays retrieval menus based on the created menu construction. More specifically, display unit 3506 displays retrieval menu 3700 in which the title is "means of transport" and the choices are "car" 3702 and "railway" 3703 as shown in FIG. 4B based on menu construction 3511b shown in FIG. 4A. When the user selects "car" 3702, display unit 3506 displays retrieval menu 3710 in which the title is "car" 3711 and the choices are "truck" 3712 and "bus" 3713 as shown in FIG. 4C.
In the example of the conventional art that has been described, these retrieval menus using one thesaurus are created based on the data obtained via a network. Even a user who is unfamilar with the use of information technology may easily retrieve necessary information with these retrieval menus. The above-described example, however, has problems. Firstly, these retrieval menus are created using one thesaurus, so that it is not possible to retrieve data when the word used for retrieval is not included in the thesaurus in which the words are categorized according to meaning sand arranged in a tree. More specifically, while the word "road" can be considered to be related to the word "means of transport", the word "road" is not related to any word in thesaurus 3511 in which the words are arranged in terms of "means of transport".
Secondly, retrieval menus in which the choices of a retrieval menu and the choices of another retrieval menu categorize data in different terms and are not related to each other in meaning can not be created based on a thesaurus. More specifically, the choices "up to 10", "11 to 20", and "over 21" in the retrieval menu "age" and the choices "student", "salaried worker", and "housewife" in the retrieval menu "occupation" categorize data in different terms and are not related to each other in meaning. As a result, retrieval menus that include the retrieval menu "occupation" below the retrieval menu "age" may not be created based on one thesaurus.
Thirdly, when data is added to or changed in a thesaurus, it is necessary to maintain the structure of the thesaurus in which data is categorized according to meaning and arranged in a tree. The addition or the change of data in a thesaurus is very complicated.