1. Field of the Invention
The present invention relates to a document clustering device for appropriately classifying and managing an enormous number of questions and answers stored in a support center and the like and to a document searching system to which this document clustering device is applied. The present invention also relates to a FAQ preparing system for preparing combinations of frequently asked questions and answers, namely, FAQ by using the above document clustering device.
An enormous number of questions sent from users are received by a support center of a business organization and the like about specifications and usage of its products and it is demanded that an accurate answer is returned without delay to each of the questions.
For this purpose, a technique for appropriately managing accumulation of an enormous number of documents consisting of many questions which were sent from the users in the past and answers to the questions and for effectively utilizing the accumulated documents is in need.
2. Description of the Related Art
Systems as described in the following three examples have been conventionally used for returning answers to questions from users in a support center and the like.
In one conventional sample answer searching system, model answers prepared in advance are searched through, using similarity between the question sent from a user and model questions prepared in advance. In this first sample answer searching system, the model questions corresponding to the model answers in which countermeasures for various situations are respectively described are prepared.
In the first sample answer searching system, for example, according to the similarity obtained between the question from the user and each of the model questions, a model question which has the highest similarity is retrieved and a model answer corresponding to the retrieved model question is returned to the user.
In a second conventional system, an answer to be returned to a user is selected according to similarity between a question received from the user and model answers prepared in advance and according to the number of times each of the model answers is used as answers. In the second sample answer searching system like this, the number of times each of the model answers prepared in advance is used as answers is stored.
In this second sample answer searching system, commonality in described contents between the question received from the user and each of the model answers is first evaluated according to a score showing the similarity between the question and each of the model answers, and importance of each of the model answers is evaluated according to a score according to the number of times each of the model answers is used. For example, in this second sample answer searching system, a model answer which has been most frequently used is selected from model answers whose scores according to the similarity are relatively high, and the selected model answer is returned to the user.
In a third conventional system, the accumulation of answers used in the past itself is utilized as a set of sample answers to a new question. In the third sample answer searching system like this, the answers returned to questions which were received in the past are stored.
In this third sample answer searching system, the above-mentioned set of the sample answers itself is searched when a new question is inputted from a user so that a sample answer having a similar described content to that of the inputted question is extracted to be returned to the user.
FAQ presented to a user in a support center and the like has been conventionally prepared in a manner in which an appropriate question is extracted manually from an accumulation of received questions or typical questions are assumed regarding various items. Here, in the operation of preparing the FAQ according to the accumulation of a number of questions, a question to which many of the questions are similar is extracted as an appropriate question. In the operation of preparing the FAQ consisting of typical questions and answers, questions are assumed regarding items which is considered that many users have questions on. The answers are prepared manually to the questions thus prepared.
However, basically, in both of the first and second sample answer searching systems as described above, the model answers need to be prepared in advance and a large amount of labor is required for the operation of preparing the model answers. Moreover, in a case revision is partly made to a model answer to return the revised model answer to a user, the revised answer could not be appropriately managed so that it cannot be effectively utilized. Meanwhile, in the third sample answer searching system, in a case answers with similar contents were repeatedly used in the past, these answers with similar contents may possibly be retrieved limitlessly. This necessitates the work of extracting an effective sample answer from the search result in order to return an appropriate answer to a user, and in addition, this work becomes more difficult as the number of retrieved sample answers increases more.
Furthermore, in the conventional FAQ preparing method described above, all the preparing operations are manually performed so that work burden on operators (or agents) in the support center is very heavy. Moreover, judgment whether or not a certain question is to be selected as FAQ is dependent on subjective judgment of the individual operators (or agents) so that there exists difference among questions which are extracted as the FAQ by the individual operators (or agents).
Incidentally, as a method of analyzing information such as an enormous number of documents, a cluster analysis is well known. It can be expected that a clue to solve, for example, the problem in the third sample answer searching system described above and the problem in the conventional FAQ preparing technique can be obtained when the cluster analysis method is applied to the accumulation of the sample answers to divide the set of the sample answers into clusters.
However, in a hierarchical cluster analysis method generally used, very long processing time is needed for the operation of classifying an enormous number of the sample answers to the clusters. On the other hand, classification itself of an enormous number of the documents into appropriate clusters respectively is what is important in the sample answer searching operation and the FAQ preparing operation. Naturally, any other information, such as information on hierarchy obtained when analyzing the accumulation of documents by the hierarchical cluster analysis, is not necessary.
Therefore, a technique for speedily classifying an enormous number of documents into non-hierarchical clusters is first required in order to solve the above-described problems.