The present invention relates to an apparatus and method for translating a speech input in one language into a sentence in other language, and a recording medium having recorded therein a speech translation controlling program.
Nowadays, as the speech recognition technics have made remarkable advances, apparatuses for translating a recognized speech in one language into a sentence in other language have been developed. For a variety of users, the speech translating apparatus generally uses a recognition engine oriented for every user.
FIG. 1 shows a conventional speech translator 100. As shown, the speech translator 100 includes an input unit 102 which is a microphone for example, a feature extractor 103 to extract features of a speech from the input unit 102, a speech recognition collator 104, an acoustic model memory 105 to store acoustic models which represent phoneme, etc., a word dictionary memory 106 to store a word dictionary which represents correspondences between words and acoustic models, and a grammar memory 107 to store grammar which represents word conjunctions (generally stochastic language models).
The input unit 102 converts a supplied speech into a speech signal, digitizes the speech signal, and then supplies the digitized speech signal to the feature extractor 103. The feature extractor 103 calculates a string of feature vectors from the digitized speech signal, and supplies the feature vector string to the speech recognition collator 104.
The speech recognition collator 104 uses the feature vector, the acoustic models stored in the acoustic model memory 105, the word dictionary stored in the word dictionary memory 106 and the grammar stored in the grammar memory 107 to recognize a word string uttered.
The speech translator 100 further includes a translation block 108 which is supplied with a collation result from the speech recognition collator 104, a conversion information memory 109, a sentence corrector 110, a correction information memory 111 to store correction rule information for use to correct an incorrect sentence, a grammar information memory 112 to store information on example sentence substitution and a result display unit 113 which is a CRT (Cathode-Ray Tube) or a LCD (Liquid Crystal Display) for example.
The translation block 108 roughly translates the collation result by means of the conversion information memory 109 which stores many example sentence information, and supplies the translation result to the sentence corrector 110. The sentence corrector 110 corrects in detail the translation result from the translation block 108 by the use of the information stored in the correction information memory 111 and grammar information memory 112.
The result display unit 113 displays the correction result from the sentence corrector 110 as the translation of the supplied speech.
Because of the recognition engine being oriented for unspecified utterers, the speech translator 100 can advantageously be used by any users, while it has difficulty in discriminating for each specified utterer the utterance manner, range of vocabulary and utter""s habit which would be constant. Thus, the speech translator 100 has to process even unnecessary speech recognition, since it does not perform speech recognition processing for each utterer.
Also, the speech translator 100 cannot optimize the interaction between the user as an utterer and the translator itself, since it has no user-adaptability or has not any means for saving the utilization frequency and tendency of each user. Thus, the speech translator 100 is likely to incur errors which would not otherwise take place. Furthermore, since the speech translator 100 cannot identify the preference or utilization mode of each user, it does not suit each user at all.
Accordingly the present invention has an object to overcome the above-mentioned drawbacks of the prior art by providing a speech translating apparatus and method, which work at a higher speed and enable optimum interactions between the user and the translator itself, and a recording medium having recorded therein a speech translation controlling program.
The above object can be attained by providing a speech translating apparatus including:
means for extracting features of a supplied speech to provide a feature vector;
means for collating the feature vector from the feature extracting means with a plurality of collation information for speech recognition to recognize the speech and providing a sentence indicative of the recognized speech;
means for translating the sentence indicative of the speech recognized by the speech recognition collating means into a sentence in a language different from that of the supplied speech;
means for correcting the sentence translated by the translation processing means by the use of optimum one of a plurality of correction information;
means for outputting the correction result of the sentence correction means;
means for accumulating history information indicative of the tendency of the sentence recognized by the speech recognition collating means; and
a cumulative learning means for comparing the speech recognition result with the history information accumulated in the history information accumulating means each time the speech recognition collating means outputs the speech recognition result to update the history information accumulated in the history information accumulating means by a cumulative learning function;
the history information accumulating means controlling, based on the accumulated history information, the selection of the collation information for use by the speech recognition collating means and/or selection of the correction information for use by the sentence correction means.
In the speech translating apparatus, history information indicative of the tendency of the sentence recognized by the speech recognition collating means is accumulated, and the speech recognition result is compared with the history information accumulated in the history information accumulating means each time the speech recognition collating means outputs the speech recognition result to update the history information accumulated in the history information accumulating means by a cumulative learning function. The history information accumulating means controls, based on the accumulated history information, the selection of the collation information for use by the speech recognition collating means and/or selection of the correction information for use by the sentence correction means. Thus, since the AI (artificial intelligence) cumulative learning function is used to select collation information for the speech translation and correction information for the correction of translated sentence, the speech translation can be done at an improved speed and with an improved performance and can also be optimized for each user.
Also the above object can be attained by providing a speech translating method including the steps of:
extracting features of a supplied speech to generate a feature vector;
collating the generated feature vector with a plurality of collation information for speech recognition to recognize the speech;
translating the sentence recognized by the speech recognition into a sentence in a language different from that of the supplied speech;
correcting the translated sentence by the use of optimum one of a plurality of correction information;
comparing the speech recognition result with the history information indicative of the tendency of the speech-recognized sentence at each speech recognition to update the history information by a cumulative learning function;
controlling, based on the updated history information, the selection of the collation information used for the speech recognition and/or selection of the correction information used for the sentence correction; and
outputting the sentence corrected by the sentence correction.
In the speech translating method, the speech recognition result is compared with the history information indicative of the tendency of the speech-recognized sentence at each speech recognition to update the history information by a cumulative learning function, and, based on the updated history information, the selection of the collation information used for the speech recognition and/or selection of the correction information used for the sentence correction are controlled. Thus, since the AI (artificial intelligence) cumulative learning function is used to control the selection of the collation information used for the speech translation and selection of the correction information used for the translated-sentence correction, the speech translation can be done at a high speed and with a high performance, whereby the adaptability of the speech translation can be optimized for each user.
Also the above object can be attained by providing a recording medium having recorded therein a speech translation controlling program which:
extracts features of a supplied speech to generate a feature vector;
collates the generated feature vector with a plurality of collation information for speech recognition to recognize the speech;
translates the sentence recognized by the speech recognition into a sentence in a language different from that of the supplied speech;
corrects the translated sentence by the use of optimum one of a plurality of correction information;
compares the speech recognition result with the history information indicative of the tendency of the speech-recognized sentence at each speech recognition to update the history information by a cumulative learning function;
controls, based on the updated history information, the selection of the collation information used for the speech recognition and/or selection of the correction information used for the sentence correction; and
outputs the sentence corrected by the sentence correction.
With the recording medium having recorded therein a speech translation controlling program, the program is installed in a computer. The computer compares the speech recognition result with the history information indicative of the tendency of the speech-recognized sentence at each speech recognition to update the history information by a cumulative learning function, and controls, based on the updated history information, the selection of the collation information used for the speech recognition and/or selection of the correction information used for the sentence correction. Thus, since the AI (artificial intelligence) cumulative learning function is used to control the selection of the collation information used for the speech translation and selection of the correction information used for the translated-sentence correction, the speech translation can be done at a high speed and with a high performance, whereby the adaptability of the speech translation can be optimized for each user.