This invention relates to an information retrieval system, and in particular to an apparatus which automatically generates search formulae in the basis of given pertinent data and non-pertinent data. More particularly, the invention relates to an apparatus which automatically generates a search formula resulting in high pertinence by selecting or combining terms that have the possibility of retrieving as many pertinent data as possible from the given data consisting of pertinent data and non-pertinent data.
In an information retrieval system, an user is generally required to form a search formula by combining terms that represent the user's need for retrieval, using logical operators such as AND and OR, and then the retrieval is performed when the search formula is fed into the retrieval system. If the user is not satisfied with the result of the retrieval because of less than expected amount of needed data, the user is supposed to make another search formula and retrieve the data again.
One of the methods for automatically generating a search formula on the basis of pertinent data was proposed by Yukio Ebinuma ("On-Line High Performance Automatic Document Retrieval Using Pertinent Information" JOURNAL OF INFORMATION PROCESSING AND MANAGEMENT, Vol.27, No. 8, pp. 692-703, 1984). In this method, the user selects in advance, for example, 10 search terms from 10 pieces of pertinent data that are given by the user. Then, the effectiveness value of a search term which represents the capability of searching the pertinent data using the search term is calculated by the following formula:
EFFECTIVENESS VALUE=(The number of pieces of pertinent data containing the search term)/(The number of pieces of data containing the search term in the database concerned).
On the basis of the effectiveness values, the search terms are joined together by the AND operator to produce one or more partial search formulae, that are in turn combined using the OR operator to generate the final search formula.
However, in the above-mentioned conventional retrieval system, in calculating the effectiveness values, the data base must be repeatedly searched for each search term in order to examine the number of pieces of data containing the search term in the data base, which would inconveniently consume much time. Moreover, the search terms should be chosen based on the judgment of the user, so that as the number of search terms in the pertinent data increases, a burden to the user would also increase.
It is an object of the present invention to eliminate the drawbacks of the conventional information retrieval apparatus and to provide an information retrieval apparatus which can retrieve the objective information quickly, easily and with high precision.