A teleconference generally involves establishing telecommunications connections with three or more terminal devices used by participants of the teleconference. For ease of explanation, the terminal devices may be denoted as terminal devices A, B and C. Generally, one of the participants, such as the user of terminal device A, initiates the teleconference by conferencing in the other participants, e.g., users of terminal devices B and C. A conference management device manages the conference. The conference management device may also be referred to as a multiparty control unit (MCU) or, alternatively, as a mixing bridge. The MCU may be located in a service provider network hosting the teleconference.
The MCU may decode audio data streams received at the MCU from the terminal devices and sum audio waveforms represented by two of the three received audio data streams to generate mixed monoaural (i.e., mono) audio waveforms. The MCU may encode the mixed monoaural audio waveforms to generate mixed mono data streams, transmitting the mixed mono data streams to respective ones of the terminal devices.
For example, the MCU may receive and decode audio data streams from terminal devices A, B and C. The MCU generates three mixed audio data streams based on the received audio data streams. The first mixed audio data stream represents a monaural mix of sounds detected by terminal devices A and B (i.e., A+B). The second mixed audio data stream represents a monaural mix of sounds detected by terminal devices A and C (i.e., A+C). The third mixed audio data stream represents a monaural mix of sounds detected by terminal devices B and C (i.e., B+C). The MCU transmits the first mixed audio data stream to terminal device C, transmits the second mixed audio data stream to terminal device B, and transmits the third mixed audio data stream to terminal device A. Terminal devices A, B, and C decode the mixed audio data streams and generate sound based on (i.e., play back) the mixed audio data streams.
Recently, MCUs and terminal devices have been developed that support three-dimensional (3D) audio in teleconferencing. In 3D audio, the MCU processes the audio data streams received from the terminal devices to generate mixed stereo audio data streams. Each of the mixed audio stereo data streams may represent sound having two or more (e.g., a left and a right) channels. The MCU may transmit these mixed stereo audio data streams to appropriate one of the terminal devices. Each of the terminal devices may play back the mixed stereo audio data streams on two or more speakers. Because of head-related transfer functions (HRTFs) applied by the MCU to the stereo audio data streams, users of the terminal devices may perceive the speech of users of the other terminal devices to come from various points in space. For example, a user of terminal device A may perceive the speech of a user of terminal device B to come from a point in space to the left of the user of terminal device A and may perceive speech of a user of terminal device C to come from a point in space to the right of the user of terminal device A. Spatially separating the voices of users in this manner may help the users to determine who is speaking during the teleconference, thereby facilitating communication among the participants of the 3D audio teleconference.