Voice conference systems allow a number of voice terminals to be connected together for a telephone conference, so that the audio signals recorded by way of respective microphones of the voice terminals of the other participants can be supplied to a respective participant as a mixed signal for audio output. The mixed signal for a participant here is a superimposition of all the audio signals applied but without the audio signal of the participant, as said participant does not need to or should not hear his/her own spoken contributions to the conference him/herself, as this would bring about a sort of unwanted echo effect of his/her own utterances. Therefore for each of the N participants in a telephone conference a specific mixed signal must be formed, in which the (N-1) voice signals of the other participants in the telephone conference are processed to form the specific mixed signal.
For example in packet-based communication systems with voice terminals, which communicate by means of packet-based methods by way of a packet-based—for example IP-based (IP: Internet Protocol)—network, audio signals recorded by way of a microphone are converted by means of a coder into data packets for the packet-based network and data packets from the packet-based network are converted by means of a decoder into audio signals for an audio output by way of a speaker, which is for example located in a telephone receiver. A combined coding and decoding unit is generally referred to as a CODEC (Coding/Decoding). Known coding methods have been standardized for example by the ITU-T (ITU-T: Telecommunication Standardization Sector of the ITU; ITU: International Telecommunication Union). These are for example the CODECs G.711, G.726 or G.729. These CODECs differ in particular in respective voice quality, respective compression rate and the respective complexity of the coding method. For example the CODEC G.729 is characterized in that it can be deployed for a high level of compression with comparatively good voice quality, it being necessary to carry out computation-intensive operations however.
Voice terminals frequently support a number of CODECs, with a common CODEC being negotiated for a connection and/or a sub-section of a connection for the respective communication partners.
In order to connect voice terminals together by way of a telephone conference, the procedure is generally such that coded voice data arriving from the voice terminals is decoded in the voice conference system, a mixed signal is generated respectively therefrom for the respective voice terminals and the mixed signal generated in each instance is converted using a coder that is suitable for the respective voice terminal. The respectively resulting mixed voice data is then transmitted to the respective voice terminals for a respective voice output by means of packet-oriented methods.
This means that for a telephone conference with N participants the voice conference system decodes N incoming voice data streams simultaneously and the N mixed signals then formed are converted by means of N coders into N outgoing voice data streams. This can result in a significant computation outlay for coding and decoding, particularly in the case of telephone conferences with a large number of participants. Also a large number of coders and decoders has to be kept available in order to support telephone conferences with a large number of participants as well.
To reduce the complexity of coding and decoding, provision can be made in voice conference systems for only CODECs requiring little computation power to be used. But such less computation-intensive CODECs largely prove to be disadvantageous in respect of voice quality and/or the bandwidth required to transmit the coded voice data.
Alternatively—and to resolve the problem of high computation outlay—a voice conference system can dispense with decoding and mixing the decoded signals, in that the coded voice data from the respective voice terminals is forwarded to all further voice terminals and only decoded and mixed in the voice terminals in each instance. Such a procedure however gives rise to other or further problems, as the bandwidth requirements in respect of the voice terminals rise significantly and provision has to be made for the voice terminals to be able to process incoming voice data streams in a parallel manner. This increases the complexity in the voice terminals considerably.