As a retrieval system that retrieves document data which have a content specified by an inputted retrieval statement among multiple document data, retrieval systems have been studied which retrieve appropriate documents reflecting the intention of the retrieval even if the document data do not completely contain the retrieval statement. Such a retrieval system can be utilized by, for example, a manufacturer of certain products, as a basic technique for a support system. In this case, the system enables the manufacturer to create a call log database containing inquiries at a call center about products and replies to the inquiries as document data in text form so as to appropriately reply to inquiries utilizing the database (see Non-patent Document 5).
The following documents are considered:                [Patent Document 1] Published Unexamined Patent Application No. 11-259524        [Patent Document 2] Japanese Patent No. 3266586        [Non-Patent Document 1] JUSTSYSTEM, “What is ConceptBase” Technology, [online], Jul. 30, 2003, JUSTSYSTEM, [retrieved on Jun. 30, 2004], Internet <URL: http://www.justsystem.co.jp/km/whats/search_q—104.html>        [Non-Patent Document 2] NRI, “Services (NRI Cyber Patent)”, [online], [retrieved on Jun. 30, 2004], Internet <URL: http://www.patent.ne.jp/01gaiyo/s-point/06.html>        [Non-Patent Document 3] Sasaki et al., “Learning Type Question and Answer System SAIQA-II Using SVM”, Journal from Information Processing Society of Japan, Vol. 45, No. 02, 2004        [Non-Patent Document 4] Matsumura et al., “Evaluation of Information Retrieving Technique Using Modifying Relationships among Words”, Journal from Information Processing Society of Japan, Vol. 41, No. SIG01-003, 2000        [Non-Patent Document 5] T. Nasukawa and T. Nagano, “Text analysis and knowledge mining system”, IBM Systems Journal, Vol. 40, No. 4, 2001        [Non-Patent Document 6] Autonomy, “Conceptual Search”, [online], [retrieved on Jun. 30, 2004], Internet <URL:http://www.autonomy.com/c/content/Products/IDOL/f/Conceptual_Search>        [Non-Patent Document 7] T. Fawcett and F. Provost, “Activity monitoring: Noticing interesting changes in behavior.”, In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53-62, 1999        [Non-Document 8] Jon Kleinberg, “Bursty and hierarchical structure in streams”, In Proc. The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002        [Non-Patent Document 9] Toshiaki FUJIKI, Tomoyuki MINAMINO, Yasuhiro SUZUKI, Manabu OKUMURA, “Discovery of Burst in Document Stream”, Study Report from Information Processing Society of Japan, 2004-NL-160, p. 85-92        [Non-Patent Document 10] Kenji YAMANISHI, “Text Mining and NLP Business], [online], NEC, [retrieved on Jun. 30, 2004], Internet <URL:http://it.jeita.or.jp/eltech/committee/knowledge/PDF/2003/Yamanishi.pdf        [Non-Patent Document 11] Nomura Research Institute, “What is True Teller?”, [online], [retrieved on Jun. 30, 2004], Internet <URL:http://www.trueteller.net/about/index.shtml>        [Non-Patent Document 12] JUSTSYSTEM, [Alize], [online], [retrieved on Jun. 30, 2004], Internet<URL:http://www.justsystem.co.jp/km/ssm>        
By way of example, a retrieval system has been proposed which considers ambiguity in extracting a keyword of a content word from retrieval statement used for retrieval or from document data (see Non-patent Documents 1, 2, and 6). Further, a retrieval system has been proposed in which meanings expressed by function words are incorporated as keywords in order to achieve more accurate retrievals (see Non-patent Document 5). Moreover, a retrieval system has been proposed which does not only determine whether or not a keyword is contained in a retrieval statement or document data but also considers the relationship between words (see Non-patent Documents 4 and Patent Documents 1 and 2). Furthermore, a proposed system that outputs an answer to a question is capable of learning on the basis of examples of correct answers to questions (see Non-patent Document 3).
Further, it is important to businesses to establish trustworthy relations with clients and improve the quality of products and client support. Thus, the businesses are desired to discover problems with products or services early. As means for discovering such problems, call logs at a call center are expected to be utilized.
Non-patent Document 7 proposes a method of sensing problems in sequentially accumulated information. Further, as an example of this method, a system has been proposed which senses a problem by finding a portion of a document stream which has a smaller input interval between documents relating to a particular keyword (see Non-patent Document 8). The following have also been proposed: a system considering the number of writes per unit time in finding the above portion (see Non-patent Document 9), a system giving an alarm if the number of occurrences of a particular top exceeds a threshold (see Non-patent Document 10), and a system that senses an increase in the frequency of a keyword to extract a suddenly emerging topic (see Non-patent Document 11). Furthermore, a system has been proposed which executes predictive analysis using examples of known defects in products or the like.
Problems to be solved by the invention include:                In theses fields, a staff member receiving the call desirably inputs the contents of the inquiry as a retrieval statement to efficiently retrieve document data on the basis of the retrieval intention.        With a retrieval system taking ambiguity into account in keyword extraction, if only content words are used as keywords, then for example, the words “hard disk” and “recognize” are extracted from the retrieval statement “no hard disk is recognized”. As a result, the retrieval intention of “not recognized” is lost, and even document data containing “is recognized” are retrieved.        Further, if function words are taken into account as keywords, then the words “hard disk” and “not recognized” are extracted from the retrieval statement “no hard disk is recognized”. Consequently, the retrieval intention of “not recognized” is reflected. However, the retrieval is carried out on the basis of whether or not the specified keyword appears in documents. As a result, document data containing the concept “CD-ROMs cannot be recognized but hard disks are recognized”.        Even if dependencies among words are taken into consideration, it is difficult to match various expression forms expressing the retrieval intension, for example, “no hard disk can be recognized” and “no hard disk is visible”. This is because even if the words are extended within the scope of synonyms to analyze the meaning of the retrieval statement, it is impossible to appropriately identify expressions (combinations of words) such as “no hard disk is visible” which are used only in particular situations.        Moreover, if call logs at the call center are utilized as means for discovering problems, word-based processing can utilize few words expressing the individual problems. Accordingly, it is difficult to classify the problems. Further, it is impossible to determine what problems are occurring on the basis of keywords reported to have an increased frequency. Furthermore, for new products, the number of calls tends to increase for all the problems. In such a situation, it is difficult to discover a particular problem early.        