1. Field of the Invention
The present invention relates to an information retrieving apparatus and a method thereof, in particular, to those suitable in the case that the language of an input keyword is different from the language of a database from which data is retrieved.
2. Description of the Related Art
In a conventional information retrieving apparatus, when the language of a keyword that is input by a user (this language is hereinafter referred to as input side language) is different from the language of a database from which data corresponding to the input keyword is retrieved (hereinafter this language is referred to as database side language), data is retrieved through a machine-translating process.
Here, we use the word xe2x80x9ckeywordxe2x80x9d as the user query or the user input to the apparatus.
In other words, the language of the input keyword language is converted into the language of the database. With the converted keyword, data is retrieved from the database. The retrieved results in the database side language are converted into the input side language and then displayed on a monitor.
In an information retrieving apparatus using a conventional machine-translating process, with synonyms expanded from an input keyword, a hit rate is increased. In addition, an apparatus that performs logical operations for expanded keywords so as to retrieve data has been proposed.
Moreover, a ranking retrieving process for ranking retrieved results of an information retrieving apparatus corresponding to match rates of retrieval keywords and retrieved data has been used. In the ranking retrieving process, the retrieved results are ranked with keywords converted into the database side language. The ranked results are converted into the input side language and presented to the user.
Now, assume that by inputting a keyword written in Japanese, data corresponding to the keyword is retrieved from a database described in English. In this case, the input keyword described in Japanese is converted into an equivalent keyword described in English. With the keyword described in English, data is retrieved from the database described in English. The retrieved results described in English are translated into Japanese. Thereafter, the retrieved results described in Japanese are presented to the user. In the ranking retrieving process, the retrieved results described in English are ranked with keywords converted into English. The ranked results are translated into Japanese and then provided to the user.
However, in the information retrieving apparatus using the conventional machine-translating process, when an input keyword is expanded into synonyms and a keyword described in the input side language is translated into the database side language, some variation in meaning may take place. In other words, the nuance of a keyword described in the input side language may be different from the nuance of a keyword described in the database side language. Thus, data that does not directly correlate with a keyword described in the input side language may be retrieved. In such a situation, when the retrieved results described in the database side language are ranked using the keyword translated into the database side language, the nuance of the keyword described in the input side language is not reflected to the ranked results described in the database side language. Consequently, the ranked results may be contrary to the intention of the user.
For example, when data is retrieved from a database described in English with a keyword input in Japanese, the retrieved results are ranked by comparing the keyword converted into English with the retrieved results described in English. Thus, documents containing the keyword converted into English are highly ranked. Unless a keyword is correctly converted from Japanese into English, documents that do not reflect the meaning of the keyword described in Japanese are highly ranked.
An object of the present invention is to provide an information retrieving apparatus that can output retrieved results corresponding to an input keyword even if the language of the input keyword is different from the language of a database from which data is retrieved.
According to an aspect of the present invention, an information retrieving apparatus comprises an inputting unit for inputting a retrieval request described in a first data format, a generating unit for generating retrieval information described in a second data format based on the retrieval request described in the first data format, a retrieving unit for retrieving data described in the second data format based on the retrieval information described in the second data format, a converting unit for converting the retrieved results from the second data format into the first data format, and an evaluating unit for evaluating the retrieved results translated into the first data format based on the retrieval request described in the first data format.
Thus, even if the data format of the retrieved results is different from the data format of the retrieval request, the data format of the retrieved results can be matched with the data format of the retrieval request. Consequently, the retrieved results can be evaluated without need to convert the data format of the retrieval request. As a result, the retrieved results exactly corresponding to the retrieval request can be obtained free of any variation in meaning caused by a conversion process of the data format of the retrieval request.
According to a further aspect of the present invention, the retrieval information described in the second data format is generated based on the key information (keyword) extracted from the retrieval request in the first data format.
Thus, since the key information is extracted in the first data format, the key information can be extracted free of a variation in meaning caused by a conversion process of data, in comparison with the case that the key information is extracted after the data format is converted into the second data format. Consequently, the key information can be extracted exactly corresponding to a retrieval request.
According to an aspect of the present invention, the retrieval information described in the second data format is generated based on the expanded results in the first data format.
Thus, since the retrieval request is expanded in the first data format, the retrieval request can be expanded free of a variation in meaning caused by the conversion process of data in comparison with the case that the retrieval request is expanded after the data format is converted into the second data format.
According to an aspect of the present invention, the retrieval information described in the second data format is generated based on the results of a logical operation in the first data format.
Thus, since the logical operation of the retrieval request is performed in the first data format, the logical operation can be performed free of a variation in meaning of the conversion process of data in comparison with the case that the logical operation is performed after the data format is converted into the second data format. Consequently, the logical operation can be performed exactly corresponding to the retrieval request.
According to an aspect of the present invention, the retrieved results described in the second data format are converted into the first data format. The retrieved results converted into the first data format are evaluated based on the key information, the expanded results, or the results of the logical operation.
Thus, even if data whose data format is different from the data format of the retrieval request is retrieved, the results retrieved over a wide range can be evaluated without need to convert the data format of the retrieval request. Consequently, the retrieved results can be evaluated exactly corresponding to the retrieval request free of a variation in meaning of a nuance due to the conversion process of the retrieval request.
According to an aspect of the present invention, the retrieved results are ranked based on the evaluated results thereof.
Thus, the retrieved results can be easily selected.
According to an aspect of the present invention, an information retrieving apparatus comprises a retrieval request inputting unit for inputting a retrieval request described in an input side format, a first format converting unit for converting the retrieval request from the input side format into a database side format, a retrieving process unit for retrieving data from the database based on the converted results of the first format converting unit, a second format converting unit for converting the results retrieved from the database from the database side format into the input side format, a retrieved results arranging unit for arranging the retrieved results converted into the input side format based on the retrieval request described in the input side format, and a retrieved results displaying unit for displaying the data arranged by the retrieved results arranging unit.
Thus, even if the retrieval request whose data format is different from the data format of the database is input, since the data format of the results retrieved from the database is matched with the data format of the retrieval request, the retrieval request can be directly compared with the converted results of the retrieved results without need to convert the data format of the retrieval request. Thus, the retrieved results exactly corresponding to the retrieval request can be extracted.
According to an aspect of the present invention, the conversion between the input side format and the database side format is a language translating process or a dictionary retrieving process.
Thus, even if the language of the input keyword is different from the language of the database from which data is retrieved, the results retrieved from the database can be determined in the language of the input keyword. Consequently, the accuracy of a data retrieving process through a machine-translating process can be improved.
According to an aspect of the present invention, after the database side language is automatically determined, a translating process or a dictionary retrieving process is performed.
Thus, the results retrieved from the database can be converted into the language of the retrieval request without need to recognize the language of the database to be retrieved from on the retrieval request side. Consequently, the results retrieved from the database can be determined based on the language on the retrieval request side.
According to an aspect of the present invention, only sentences that contain a retrieval keyword are converted in the results retrieved from the database.
Thus, information irrelevant to a retrieval request is discarded before performing the conversion. Consequently, the process time of the retrieving process through a machine-translating process can be shortened.
According to an aspect of the present invention, only paragraphs that contain the retrieval keyword are converted in the results retrieved from the database.
Thus, information irrelevant to the retrieval request is discarded thereby preserving the accuracy of the retrieval. Consequently, the process time of the retrieving process through a machine-translating process can be shortened.
According to an aspect of the present invention, the retrieval request is expanded in the input side format. The expanded results are converted into the database side format. Data is retrieved from the database based on the expanded results described in the database side format.
Thus, the retrieval request can be expanded free of a variation in meaning caused by the data conversion process. Consequently, the expanded results can closely reflect the contents of the retrieval request. As a result, the accuracy of the retrieving process for the database through the data conversion process can be improved.
According to an aspect of the present invention, the retrieved results converted into the input side format are arranged based on a weight assigned to the expanded results.
Thus, if a plurality of retrieved results corresponding to expanded results are obtained, the retrieved results can be easily arranged corresponding to the contents of the expanded results.
According to an aspect of the present invention, the retrieved results converted into the input side format are arranged based on a weight assigned to the converted result of the retrieval request.
Thus, if a plurality of retrieved results corresponding to the converted results of the retrieval request are obtained, the retrieved results can be easily arranged corresponding to the converted results of the retrieval requests.
According to an aspect of the present invention, data is retrieved based on each element of retrieval information, when a plurality of retrieval information is generated corresponding to the retrieval requests.
Thus, all information relevant to the retrieval request can be retrieved. Consequently, information corresponding to the retrieval request can be retrieved over a wide range.
According to an aspect of the present invention, the results retrieved from the database are converted in correspondence with each of a plurality of candidates, when a plurality of alternatives are generated for an element of a conversion result from the database side format into the input side format.
Thus, even if a variation in meaning takes place due to a conversion process of a data format, all candidates generated due to the variation in meaning can be presented. Consequently, desired data can be prevented from being lost against a variation in meaning caused by a conversion process of a data format. As a result, the accuracy of the retrieving process can be improved.
According to an aspect of the present invention, at most one converted result for the same retrieved result is selected when the plurality of candidate are generated by the conversion from the database side format into the input side format.
Thus, since redundantly retrieved results are discarded before presenting the retrieved results, the retrieving process can be effectively performed.
According to an aspect of the present invention, when a plurality of candidates are generated for elements of converted results from the database side format into the input side format, the plurality of candidates are expanded in the results retrieved from the database.
Thus, even if a variation in meaning takes place in a conversion process of a data format, all candidates generated due to the variation in meaning can be presented in the same retrieved results. Consequently, the result of the arithmetic operation can closely reflect the contents of the retrieval request. As a result, the accuracy of the retrieving process can be improved.
According to an aspect of the present invention, the retrieved results converted into the input side format are arranged based on the results of the logical arithmetic operation for the retrieval request described in the input side format.
Thus, the logical arithmetic operation can be performed for the retrieval request free of a variation in meaning of the conversion process. Consequently, the expanded results can closely reflect the contents of the retrieval request of the arithmetic operation. As a result, the accuracy of the retrieving process through the data conversion process can be improved.
According to an aspect of the present invention, the retrieved results converted into the input side format are arranged based on the correlation rate of the retrieval request described in the input side format and the retrieved results converted into the input side format.
Thus, since the retrieved results described in the database side format are compared in the input side format, the retrieved results exactly reflecting the retrieval request can be easily selected.
According to an aspect of the present invention, a portion that matches the retrieval request is highlighted in the retrieved results converted into the input side format.
Thus, the user can directly know the correlation between the retrieved results and the retrieval request. Consequently, the user can easily know the retrieved results that match the retrieval request.
According to an aspect of the present invention, a portion that matches the retrieval request and a portion that matches the expanded results of the retrieval request are highlighted in the retrieved results converted into the input side format to be separately distinguishable.
Thus, the correlation between the retrieved results and the retrieval request can be displayed over a wide range. Consequently, retrieved results that match the retrieval request and the retrieved results with correspondence to the retrieval request can be easily identified.
According to an aspect of the present invention, the retrieving process for the database and the conversion process for the retrieved results from the database side format into the input side format are performed in parallel.
Thus, data can be converted from a database side format into an input side format, at each time when a retrieval is finished. Consequently, the conversion process is performed before all retrieved results are accumulated. As a result, the conversion process can be performed at high speed.
According to an aspect of the present invention, the retrieved results are arranged according to the input side format. However, the retrieved results are displayed in the database side format.
Thus, since the conversion process from the database side format into the input side format is not properly performed, when the retrieved results are presented in the database side format to the user, the ranked results in the input side format can be affected to the results presented to the user.
According to an aspect of the present invention, the results retrieved from the database are ranked in the database side format. The highly ranked retrieved results are selected from among the ranked results described in the database side format. Only the highly ranked retrieved results are converted from the database side format into the input side format. By comparing the retrieval request described in the input side format with the highly ranked retrieved results that have been converted into the input side format, the retrieved results are arranged.
Thus, that lowly ranked retrieved results in the database side format can be suppressed from being converted. Consequently, the process time necessary for the retrieving process through the data conversion process can be shortened.
According to an aspect of the present invention, data is retrieved from a plurality of databases whose database side formats are different based on the retrieval request described in an input side format. The results retrieved from the databases are converted from the database side format into the input side format. The retrieved results are arranged.
Thus, even if the databases are described in various data formats, data can be retrieved from these databases based on one retrieval request at a time. The retrieved results described in the various data formats can be evaluated in the input side format. Consequently, data can be accurately retrieved over wide range.
According to an aspect of the present invention, the types of data formats are displayed corresponding to the results retrieved from the database.
Thus, even if the retrieved results are displayed in the input side format, the user can determine the data format of the database.
According to an aspect of the present invention, data is retrieved from the database using the retrieval request described in the input side format. The retrieved results are displayed. In addition, the retrieval request described in the input side format is converted into the database side format. Data is retrieved from the database using the converted retrieval request. These retrieved results are displayed at the same time.
Thus, data relevant to the retrieval request can be retrieved over a wide range and displayed.
According to an aspect of the present invention, the results retrieved from the database using the retrieval request described in the input side format and the results retrieved from the database using the retrieval request converted from the input side format into the database side format are displayed separately on the same screen.
Thus, the user can easily determine the databases from which the data is retrieved.