1. Technical Field
The present invention relates to translating speech from one language to another, and more particularly, to a system, apparatus and method for collecting multilingual speech through multiple recording channels and translating recorded speech accordingly during the use of a speech-to-speech translation system.
2. Description of the Related Art
Modern speech-to-speech (S2S) translation systems attempt to enable communications between two people that do not share the same language. To smooth the conversation between two speakers with different languages, current S2S translation systems have to handle two challenges. First, the system needs to know which language the user is currently speaking, based on either user feedback/selection or automatic language identification. Second, the system needs to either prevent two speakers from talking simultaneously or be able to focus on one of the speakers during a conversation.
In most state-of-the-art S2S translation systems, these two challenges are handled either ineffectively or in a user-unfriendly way. For the first challenge, to retrieve the language information, two buttons are commonly designed in a Graphical User Interface (GUI) to let the speakers control the recording of two languages respectively, which breaks the conversation into pieces and hence significantly reduces the information exchange speed and efficiency. Other S2S translation systems apply automatic language identification techniques, at the cost of inevitable identification errors and the resulting system malfunctions.
For the second challenge, an even bigger challenge occurs if the S2S translation system wants to focus on one of the speakers during a conversation when both users are talking. Moreover, it is a very difficult task to synchronize the conversation between two speakers without cross talking, especially when these two speakers do not share a common language.