Multilingual speech-to-speech language translation systems have been developed to facilitate communication between people that do not share a common language. One example of such a system is the speech-to-speech translation system developed by Carnegie Mellon University (Pittsburgh, Pa.).
A speech-to-speech translation system allows a user who has been trained with the system (hereinafter “system user”) to communicate with another person who speaks another language (hereinafter “foreign language speaker” or just “foreign speaker”) and is most often not familiar with the system, by providing speech-to-speech translation service between the two parties.
Since conventional speech-to-speech translation systems can handle only one speaker at a time, the two speakers need to take turns during the communication. Therefore, the indication (or prompt) of the switch of turns becomes a very important issue in order to ensure a smooth speech translation multilingual conversation.
Various prompts to indicate the switch of turns exist in conventional speech-to-speech translation systems. The most widely adopted prompt uses audio sound effects such as a beep sound. The sound effects can be language dependent so that a specific sound represents a specific language. The drawback of this approach is that both the system user and the foreign language speaker need to be trained to be familiar with the meaning of these sound effects. For a frequent system user, this brings additional inconvenience, as he or she must remember the meaning of sound effects for each language supported by the system. For a foreign speaker who is not familiar with or has never used this kind of system before, this function is not easily usable for them since the system user cannot explain the function to the foreign speaker because of the language barrier. The foreign speaker needs to guess the meanings of these sounds, often with great frustration and, consequently, with great dissatisfaction.
Another solution is to use visual prompts. The system user can point a microphone associated with the system to himself or herself when he or she starts to talk and point the microphone to the foreign speaker to indicate for the foreign speaker to start to talk. Other visual indications or gestures may be used to indict the switch of the turn. However, visual prompts are only helpful in face-to-face speech translation conversations and are useless for other scenarios such as automatic speech translation through call centers. Additionally, in some situations such as emergency medical care, patients speaking another language may keep their eyes closed due to their medical conditions so that the above-described visual prompts may be completely useless. Furthermore, these visual indications may still be confusing without verbal explanations.