The present invention relates to an apparatus and a method for retrieving document based on voice or the like. More specifically, the present invention provides a document retrieving apparatus and a document retrieving method capable of assuring an effective and reliable document search which is not adversely influenced by the sentence recognition accuracy in the voice-based document retrieving operation.
Conventionally known, as representative voice-based document retrieving apparatus/method, is a document retrieving apparatus/method which combines the voice or speech recognition and the whole sentence retrieval.
FIG. 43 shows a conventional voice-based document retrieving apparatus. The conventional voice-based document retrieving apparatus shown in FIG. 43 comprises an audio input section 4301 which converts a sound or voice, such as a user""s utterance, into an electric signal. A sentence recognizing section 4302 receives the electric signal from audio input section 4301 and recognizes the sound as a sentence. A retrieval condition producing section 4303 produces retrieval conditions for retrieving documents based on the sentence recognized by the sentence recognizing section 4302. A document storing section 4304 stores documents to be retrieved. A document retrieving section 4305 retrieves the documents stored in the document storing section 4304 based on the retrieval conditions produced by the retrieval condition producing section 4303. And, an information output section 4306 outputs the document search result having been done by the document retrieving section 4305.
FIG. 44 is a flowchart showing the document retrieving operation performed in the above-described conventional document retrieving apparatus. First, in the flowchart shown in FIG. 44, the audio input section 4301 converts the user""s utterance into an electric signal (step 4401).
Next, the sentence recognizing section 4302 analyzes the electric signal of the user""s voice or speech as a character pattern signal and recognizes a sentence based on the analyzed character patterns (step 4402).
The retrieval condition producing section 4303 produces the retrieval conditions for retrieving documents based on the sentence recognized by the sentence recognizing section 4302 (step 4403).
The document retrieving section 4305 retrieves the documents (i.e., retrieval objects) stored in the document storing section 4304 based on the retrieval conditions produced by the retrieval condition producing section 4303 (step 4404).
The information output section 4306 informs an outside device or person, such as the user, of the document search result having been done by the document retrieving section 4305(step 4405).
As apparent from the foregoing description, the above-described conventional document retrieving apparatus/method recognizes the voice as a sentence, produces the retrieval conditions based on the recognized sentence, and retrieves the documents (i.e., retrieval objects) based on the produced retrieval conditions, thereby accomplishing the voice-based document retrieval operation.
However, the following problem arises in the above-described conventional document retrieving apparatus/method. In general, the voice or speech recognition is subjected to severe input circumstances including uncertainty in user""s utterance, performance reliability of voice input device, and inclusion of noises. Thus, there is the possibility that the converted electric signal of the input voice may comprise a strange word (or character) not involved in the original voice or speech but similar to the word (or character) inherently involved in the original voice or speech.
Accordingly, because of inclusion of such strange words not involved in the original voice or speech, the above-described conventional document retrieving apparatus/method may erroneously recognize such strange words as candidate words constituting the sentence corresponding to the input voice or speech. In some cases, this kind of strange or error words have a higher likelihood than the corresponding true or genuine words inherently involved in the original voice or speech.
FIG. 45 is a sample explaining the voice or speech recognition performed by the above-described conventional document retrieving apparatus/method.
In FIG. 45, someone speaks xe2x80x9csan-in e ryokyoo shitain desugaxe2x80x9d, the sound of which is entered into the audio input section 4301. In this case, the audio input section 4301 may erroneously convert the input sound into an electric signal representing a phonemic string of xe2x80x9csanninderyokooshitaiindesuga.xe2x80x9d Namely, xe2x80x9csanninxe2x80x9d/xe2x80x9csan""inxe2x80x9d, xe2x80x9cdexe2x80x9d, xe2x80x9cryokooxe2x80x9d, xe2x80x9cshitaxe2x80x9d, xe2x80x9ciinxe2x80x9d, and xe2x80x9cdesugaxe2x80x9d are recognized as candidate words for constituting the sentence. Regarding the expression of xe2x80x9csanninxe2x80x9d/xe2x80x9csan""inxe2x80x9d, it means that the word xe2x80x9csannin (three persons)xe2x80x9d has a higher likelihood than that of the word xe2x80x9csan""in (San-in area).xe2x80x9d Thus, xe2x80x9csanninxe2x80x9d is ranked high.
The above conventional voice-based document retrieving apparatus/method, however, constructs only one sentence based on the recognized candidate words in compliance with its own standards for the sentence recognition. In this case, the actually spoken word xe2x80x9csan""in (San-in area)xe2x80x9d will be deleted or dropped due to its lower likelihood whereas it is the true or genuine word inherently involved in the original utterance.
According to the example shown in FIG. 45, the sentence xe2x80x9csannin de ryokoo shita iin desugaxe2x80x9d is finally recognized. The actually spoken word xe2x80x9csan""in (San-in area)xe2x80x9d disappeared from the resultant sentence, because the word xe2x80x9csan""in (San-in area)xe2x80x9d has a lower likelihood than that of the word xe2x80x9csannin (three persons).xe2x80x9d Accordingly, xe2x80x9csan""in (San-in area)xe2x80x9d is no longer involved in the document retrieval conditions produced by the retrieval condition producing section 4303. Instead, the resultant sentence comprises some strange (error) words, such as xe2x80x9csannin (three persons)xe2x80x9d and xe2x80x9ciin (doctor""s office)xe2x80x9d etc. Therefore, in the step 4404, the document retrieval operation is improperly performed based on the wrong sentence having a different meaning not corresponding to the original voice or speech.
As described above, there is the problem that the above-described conventional document retrieving apparatus/method possibly deletes or drops the actually spoken word in the sentence recognition and therefore produces wrong retrieval conditions. Thus, it becomes impossible to successfully perform the document retrieval operation.
Furthermore, to realize a highly accurate sentence recognition for the general sentences of natural language, the above conventional voice-based document retrieving apparatus/method requires a huge number of general language data relating to normally used various vocabulary and sentence patterns to perform the sentence recognition with reference to these language data. Thus, the tremendous cost is required for collecting or establishing such a huge language data base.
In view of the above, the present invention has an object to provide a document retrieving apparatus and a document retrieving method capable of assuring an effective and reliable document search which is not adversely influenced by the sentence recognition accuracy in the voice-based document retrieval operation.
Furthermore, another object of the present invention is to provide a document retrieving apparatus and a document retrieving method capable of suppressing the cost in collecting or establishing a necessary language data base for the voice-based document retrieval operation.
In order to accomplish this and other related objects, a first aspect of the present invention provides a document retrieving apparatus for performing a document search based on sound including voice. The first aspect document retrieving apparatus comprises an audio input means for converting a sound into an electric signal and generating a character pattern data. A language model storing means is provided for storing likelihood information which determines the likelihood of a word recognized from the character pattern data produced from the audio input means. A word choosing means is provided for choosing an estimated word to be involved in the character pattern data produced from the audio input means, as a word selection result based on the likelihood information stored in the language model storing means. A retrieval condition producing means is provided for producing document retrieval conditions based on the word selection result chosen by the word choosing means. A document storing means is provided for storing documents to be retrieved. And, a document retrieving means is provided for retrieving the documents stored in the document storing means based on the document retrieval conditions produced from the retrieval condition producing means.
Furthermore, to accomplish the above objects, the first aspect of the present invention provides a document retrieving method for performing a document search based on sound including voice. The first aspect document retrieving method comprises a step of converting a sound into an electric signal and generating a character pattern data, a step of choosing an estimated word to be involved in the character pattern data, as a word selection result based on likelihood information which determines the likelihood of a word recognized from the character pattern data, a step of producing document retrieval conditions based on the word selection result, and a step of retrieving documents based on the document retrieval conditions.
Accordingly, the document retrieving apparatus and the document retrieving method in accordance with the first aspect of the present invention choose the estimated word to be involved in the original user""s utterance when this word has a predetermined likelihood. Hence, as long as the true or genuine word inherently involved in the original voice or speech has a predetermined likelihood, it becomes possible to prevent the estimated word from being deleted or dropped in the sentence recognition even in a case where only one sentence is finally constructed based on the recognized candidate words. Accordingly, the first aspect of the present invention makes it possible to realize an effective and reliable document search without being adversely influenced by the sentence recognition accuracy in the voice-based document retrieval operation. Furthermore, it is not necessary to choose all of the words involved in the user""s utterance, because the first aspect of the present invention only requires to constitute a minimum language model in accordance with the document assembly serving as retrieval objects. Thus, the first aspect of the present invention reduces the cost in collecting or establishing the necessary language data base.
Furthermore, in the above-described document retrieving apparatus, it is preferable to further comprise an information output means for outputting search result obtained from the document retrieving means.
With this arrangement, it becomes possible to let the operator know the retrieval result. The retrieval result can be edited in accordance with operator""s instructions. The edited retrieval result can be shown to the client and the operator. Accordingly, it becomes possible to realize a highly accurate document search.
Furthermore, to accomplish the above objects, a second aspect of the present invention provides a document retrieving apparatus for performing a document search based on sound including voice. In this second aspect document retrieving apparatus, a first audio input means is provided for converting a first sound into an electric signal and generating a first character pattern data. A first language model storing means is provided for storing likelihood information which determines the likelihood of a word recognized from the first character pattern data produced from the first audio input means. A first word choosing means is provided for choosing an estimated word to be involved in the first character pattern data produced from the first audio input means, as a first word selection result based on the likelihood information stored in the first language model storing means. A second audio input means is provided for converting a second sound into an electric signal and generating a second character pattern data. A second language model storing means is provided for storing likelihood information which determines the likelihood of a word recognized from the second character pattern data produced from the second audio input means. A second word choosing means is provided for choosing an estimated word to be involved in the second character pattern data produced from the second audio input means, as a second word selection result based on the likelihood information stored in the second language model storing means. A word selection result comparing means is provided for comparing the first word selection result chosen by the first word choosing means with the second word selection result chosen by the second word choosing means to produce a new word selection result. A retrieval condition producing means is provided for producing document retrieval conditions based on the new word selection result produced by the word selection result comparing means. A document storing means is provided for storing documents to be retrieved. And, a document retrieving means is provided for retrieving the documents stored in the document storing means based on the document retrieval conditions produced from the retrieval condition producing means.
Furthermore, to accomplish the above objects, the second aspect of the present invention provides a document retrieving method for performing a document search based on sound including voice. The second aspect document retrieving method comprises a step of converting a first sound into an electric signal and generating a first character pattern data, a step of choosing an estimated word to be involved in the first character pattern data, as a first word selection result based on likelihood information which determines the likelihood of a word recognized from the first character pattern data, a step of converting a second sound into an electric signal and generating a second character pattern data, a step of choosing an estimated word to be involved in the second character pattern data, as a second word selection result based on likelihood information which determines the likelihood of a word recognized from the second character pattern data, a step of comparing the first word selection result with the second word selection result to produce a new word selection result, a step of producing document retrieval conditions based on the new word selection result, and a step of retrieving documents based on the document retrieval conditions.
In this manner, two corresponding estimated words are compared. And, the comparison result is used to produce the retrieval conditions. Thus, it becomes possible to realize an effective and reliable document search when two corresponding utterances are cooperatively used in the document search.
For example, in the comparison of the two corresponding estimated words, it may be preferable to increase the likelihood of the estimated word involved in the second utterance than that of the estimated word involved in the first utterance. It may be also preferable to further increase the likelihood of an estimated word if this word is involved in both of the first and second utterances. It may be also preferable to decrease the likelihood of an estimated word if this word is involved in the first utterance but is not involved in the second utterance. In some cases, the second utterance has a role for assisting the document search based on the first utterance. In this respect, the second utterance may include repetition of important words involved in the first utterance. Or, the second utterance may include supplemental words or revised words correcting the uncertainty or unclearness of first utterance. Thus, the second aspect of the present invention makes it possible to use appropriate retrieval conditions compared with the case where only the first utterance is used in the document search. Thus, it becomes possible to surely realize an effective and reliable document search.
Furthermore, in the above-described document retrieving apparatus, it is preferable to further comprise an information output means for outputting retrieval result obtained from the document retrieving means in such a manner that the retrieval result is differently edited for each of a source of the first sound and a source of the second sound.
With this arrangement, it becomes possible to differently edit the retrieval result according to each of the first utterance and the second utterance and to output the edited retrieval result to respective users.
Especially, when the document search based on the first utterance is aided by the second utterance, all of the retrieval result can be output to the second utterance user. On the other hand, the first utterance user can receive the effective retrieval result only which is selected according to the judgement of the second utterance user. In this manner, the first utterance user can selectively receive the effective retrieval result only.
Furthermore, in the above-described document retrieving apparatus, it is preferable that the retrieval condition producing means produces the document retrieval conditions based on all of the new word selection result produced by the word selection result comparing means, the first word selection result chosen by the first word choosing means, and the second word selection result chosen by the second word choosing means.
With this arrangement, it becomes possible to obtain proper retrieval conditions in accordance with the used environment. Thus, the document search can be effectively performed.
Furthermore, in the above-described first aspect document retrieving apparatus, it is preferable to further comprise an additional information administrating means for producing additional information to be added to the word selection result chosen by the word choosing means in accordance with its internal condition, so as to renew the internal condition. In this case, the retrieval condition producing means produces the document retrieval conditions based on both the word selection result chosen by the word choosing means and the additional information produced by the additional information administrating means.
Preferably, the above-described additional information administrating means renews the likelihood information stored in the language model storing means based on the additional information.
Furthermore, in the above-described second aspect document retrieving apparatus, it is preferable to further comprise an additional information administrating means for producing additional information to be added to the new word selection result produced by the word selection result comparing means in accordance with its internal condition, so as to renew the internal condition. In this case, the retrieval condition producing means produces the document retrieval conditions based on both the new word selection result produced by the word selection result comparing means and the additional information produced by the additional information administrating means.
Preferably, the above-described additional information administrating means renews the likelihood information stored in the language model storing means based on the additional information.
In general, the likelihood as to whether a utterance includes a specific word tends to vary according to the context including this utterance. Thus, it becomes possible to increase the word choosing accuracy by reflecting the contextual constraint formed by a series of utterances to the reference information referred to in the word choosing operation. Accordingly, when the document search is repetitively performed based on user""s utterances, the above-described additional information administrating means reflects the word selection result to its internal condition. Furthermore, it is possible to reflect the word selection result to the language model which is referred to in the word choosing operation. As a result, the contextual constraint formed by a series of user""s utterances can be reflected to the word choosing operation. Thus, it becomes possible to increase the word choosing accuracy. Thus, it becomes possible to realize an effective and reliable voice-based document search.
Furthermore, in the above-described first aspect document retrieving apparatus, it is preferable to further comprise a word-to-word relationship information storing means for storing word-to-word relationship information relating to the relationship established between predetermined words. In this case, the additional information administrating means produces the additional information to be added to the word selection result chosen by the word choosing means based on both the word-to-word relationship information stored in the word-to-word relationship information storing means and the internal condition of the additional information administrating means, so as to renew the internal condition.
Furthermore, in the above-described second aspect document retrieving apparatus, it is preferable to further comprise a word-to-word relationship information storing means for storing word-to-word relationship information relating to the relationship established between predetermined words. In this case, the additional information administrating means produces the additional information to be added to the new word selection result produced by the word selection result comparing means based on both the word-to-word relationship information stored in the word-to-word relationship information storing means and the internal condition of the additional information administrating means, so as to renew the internal condition.
The likelihood of two specific words appearing in the same utterance, i.e., co-occurrence of two specific words, varies depending on the relationship existing between these specific words. Accordingly, by providing the word-to-word relationship information storing means, it becomes possible to refer to the information relating to the relationship established between the specific words in addition to the internal condition as well as to refer to the word selection result, in the production of the retrieval conditions. Thus, the retrieval conditions can be produced by using the additional information resulting from the word selection result. For example, when a word seldom appears together with another selected word, it is preferable to delete or exclude this strange word from the retrieval conditions. Accordingly, even when any error occurs in the word choosing operation, it becomes possible to surely exclude such erroneous words from the retrieval conditions. Thus, it becomes possible to realize an effective and reliable voice-based document search.
Furthermore, it is possible to refer to the word-to-word relationship to check whether or not a word not involved in the word selection result has any relationship with the word involved in the word selection result. If there is a relationship that the two words tend to appear together, it will be preferable to produce the retrieval conditions so as to include such a word not involved in the word selection result. Or, it will be preferable to produce the retrieval conditions so as to increase the priority order of a resultant document including such a word. This will result in an effective and reliable voice-based document search. On the contrary, as a result of the reference to the word-to-word relationship, it may be concluded that the above two words seldom appear together. In this case, it will be preferable to produce the retrieval conditions so as to exclude the word not involved in the word selection result. Or, it will be preferable to produce the retrieval conditions so as to decrease the priority order of a resultant document including such a word. This will also result in an effective and reliable voice-based document search.
Furthermore, in the above-described document retrieving apparatus, it is preferable to further comprise a relevant word information choosing means for choosing relevant word information of a characteristic word relating to a document group produced as search result from the document retrieving means. In this case, the additional information administrating means renews the internal condition of the additional information administrating means based on the relevant word information chosen by the relevant word information choosing means.
With this arrangement, to realize an effective and reliable document search, it becomes possible to utilize the cooccurring nature of two specific words in producing the retrieval conditions based on the word selection result obtained from user""s utterance. Furthermore, in the document search, it becomes possible to choose a characteristic word from the document group produced as search result. The chosen characteristic can be utilized to correct the retrieval conditions so as to increase the retrieval accuracy, or can be utilized to facilitate the search of related documents.
Furthermore, in the above-described document retrieving apparatus, it is preferable that the additional information administrating means renews the internal condition of the additional information administrating means based on both the relevant word information chosen by the relevant word information choosing means and the word-to-word relationship information stored in the word-to-word relationship information storing means.
In the above-described document retrieving apparatus, the characteristic word is chosen from the document group produced as search result obtainable from the user""s utterance. It is possible to produce additional information indicating the cooccurring nature between the chosen characteristic word and the word chosen from the user""s utterance. The produced additional information is reflected in the renewal of the internal condition. Thus, the chosen characteristic word can be utilized in the production of the retrieval conditions for the retrieval operation based on the succeeding user""s utterance. This arrangement is advantageous in that no preparation is required for the knowledge relating to the relationship established between specific words. Thus, without preparing such knowledge, the characteristic word is chosen from the document group produced as search result in the document search. The chosen characteristic can be utilized to correct the retrieval conditions so as to increase the retrieval accuracy, or can be utilized to facilitate the search of related documents.
Furthermore, choosing the cooccurring nature from the retrieved document can be utilized in the following manner. For example, this arrangement is applicable to the information relating to the cooccurring nature only found in a specific field or theme, or to the information relating to the cooccurring nature relevant to the brand-new peculiar noun. Thus, it becomes possible to realize an effective and reliable document search.
Moreover, when the document search is repetitively performed based on user""s utterances, the word selection result obtainable from each user""s utterance can be reflected to its internal condition and maintained there. Furthermore, this word selection result can be reflected in the production of the retrieval conditions from the word selection result obtainable from the succeeding utterance. As a result, the contextual constraint formed by a series of user""s utterances can be reflected to the retrieving operation. Thus, it becomes possible to increase the retrieving accuracy.