Conventionally, automatic interpretation has been performed mainly on a server, a workstation, or a PC (Personal Computer). This is because the automatic interpretation technology includes three component technologies: speech recognition, machine translation, and speech synthesis, all of which need a large amount of calculation and a mass storage unit. Nowadays, as the performance of portable devices such as smart phones and personal digital assistants (PDAs) is evolving and the portable devices have large capacity of memory enough to execute automatic interpretation, various attempts to equip a portable device with the automatic interpretation technology are being made. In case a portable device is equipped with an automatic interpreter adopting the automatic interpretation technology, the user can use the automatic interpretation function at any time and any place, whereby user's convenience is considerably enhanced.
However, the automatic interpreter incorporated into a portable device merely transmits an automatic interpretation result through speech synthesis to the other party, but does not exchange any information with a portable device carried by the other party. Thus, a variety of useful information helping to improve the automatic interpretation performance cannot be utilized. For example, if no information is provided about what language the other party uses, the user does not know what target language should be used to execute automatic interpretation.
In addition, depending on the performance of the automatic interpreter, it may be advantageous to specify a place or area of application of interpretation, for example, an interpretation range, such as restaurants, shopping, medical, transportation, etc., in advance and execute interpretation within a limited range, rather than interpreting for all areas. However, if such information is not consistent between the user of the automatic interpreter and the other party, the other party has to modify its information, one by one, to be suited to the user's information.
Automatic interpretation is an interaction between people, and therefore, if the interaction is understood through a process of understanding speech language in the course of automatic interpretation and information associated with this interaction is provided, this will be of much help in speech recognition and translation. For example, if the user asks the name of the other party, it is expected that the other party will say his or her name in the next speech. Therefore, by giving a weight value to information corresponding to the name, and looking up a lexicon of personal names separately, the next speech recognition can be efficiently implemented as compared to not doing such processing. Also, in a translation process, information such as name serves as auxiliary information which can be used for transliteration or to efficiently resolve ambiguity. However, such information cannot be used in the conventional technology because there is no exchange of the related information.
In addition, the conventional speech recognition technology is highly susceptible to noise. Thus, when noise is introduced together with a speech signal, speech recognition performance is significantly deteriorated. For example, in case of automatic interpretation in a portable device, it is to be expected that an automatic interpreter will be used in places exposed to various noise sources, such as subway stations, shopping centers, crowded restaurants, etc. This will inevitably lead to speech recognition performance degradation due to noise, which is difficult to be resolved in the conventional technology.
Moreover, the pitch of synthesized speech may need to be adjusted differently depending on whether it is quiet or noisy. With the conventional technology, however, the user cannot handle such situations without manual manipulation.