The present invention relates to an information retrieval system that enables users to easily find information they seeks among a large amount of information.
In recent years, with widespread use of the Internet, access to a large amount of information has been made available to general users, through various home pages written in Hypertext Markup Language (HTML) provided on the World Wide Web (WWW), for example. In addition, collections of frequently asked questions (FAQ), which list pairs of frequently asked questions and answers thereto, have been made open to public. Users can obtain answers associated with their questions using such a list. These types of information are convenient for users since prompt browsing is possible as long as whereabouts of information they seek are known. In reverse, however, it is painful work for users if they have to find information they seek among a large amount of information.
A retrieval technique to overcome the above trouble is known, in which keywords are extracted from documents as feature amounts, and an inner product of the feature amounts is calculated to obtain similarity between two documents. Based on the similarity, a document similar to a question is retrieved.
This technique however has the following problems. Since information units on the Internet and FAQ collections accumulated based on past cases are provided independently by a number of individual providers, information inevitably overlaps resulting in existence of many documents having similar contents. In the conventional technique, therefore, a large number of documents having similar contents are retrieved as a document similar to a certain question. As a result, users are required to do a work of finding information they want among the large amount of documents as the retrieval results. In reverse, if the amount of retrieval results displayed is limited to a fixed number, users may fail to find information they want.
In the conventional technique, also, even if users succeed in finding information they want from the retrieval results, this matching is not reflected in the relevant FAQ collection. Accordingly, the same procedure for finding information is repeated when another user attempts retrieval using the same condition. In order to expand the FAQ collection while avoiding overlap of information, it is required to check whether or not like information already exists in the collection. This is burdensome to information providers.