1. Field of the Invention
The present invention relates to an apparatus and method for calculating the degree of mutual association between a plurality of key words used when carrying out a search, and associating each of the key words to the other.
This application is based on patent application No. Hei 9-148519 filed in Japan, the content of which is incorporated herein by reference.
2. Description of the Related Art
Information searching is a technology wherein, after documents have been accumulated in a database, the documents related to a query--an expression of information needs--given by the user are extracted from this database. A query is either a single key word or an expression including several key words, for example, "communication AND computer" or "communication OR computer". In the latter case, the query specifies extraction of documents related to both the key words "communication" and "computer", or to at least one of key words "communication" and "computer". Here, "a document related to a word" means that when a certain word in a certain document is given in advance as a keyword, the document matches the key word, or that when the keyword is included in a document, the document matches the key word. A document is a data object, usually textual, though it may also contain other types of data such as pictures, photographs, movies and so on.
Here, in information searching, if we could ascertain which information is commonly desired by many people, this could be reflected in information collection planning, and in providing an effective information search service by making this information accessible by menu selection.
However, each user may use different key words when searching for identical information because different users may see the same bit of information based on mutually differing unique viewpoints. Therefore, accurately grasping what information is commonly desired by users is impossible simply by adding the use frequency of a key word together.
However, if one can find out the strength of the association degree between words used in a predetermined time interval, words having a strong association with each other can be treated as key words used for obtaining identical information, and therefore, we can find the strength of the association degree of requested information accumulated in a database, based, for example, on the key word.
In this case, conventionally, in an associated word dictionary, such as thesauri, the relationship between one key word and another key word is statically defined, and if this associated word dictionary is used, the relationship between the key words can be obtained, and therefore, it is possible to find the strength of the association of the requested information accumulated on the database, etc.
However, in the above-described associated word dictionary, neither current neologisms such as individual product names and abbreviations, nor the association between key words the user treats as "associated" at the time of the search, that is, "an association between key words whose connections become strong temporarily" can be treated. For example, because "New Years Card" and "Lottery Number" in the New Years season are frequently used in searching for "the lottery numbers of New Years cards", it is desirable to compile these in one group as an identical information request, and in contrast should not compile these in one group outside the New Years season. "Soccer" and "World Cup", or "Ski" and "Hokkaido" would not be compiled in one group, in the same way as above.
That is, conventionally, because key words requesting identical information cannot be compiled in appropriate groups, it is impossible to appropriately calculate the association degree between key words, and therefore, the problem arises that it is difficult to accurately grasp what information many users desire.