Various audio and video conferencing services have been available for long, particularly in circuit-switched telecommunications networks. Teleconferencing systems can be divided into distributed and centralized systems, of which the latter ones have turned out to be more advantageous in providing teleconferencing services, considering the service providers and the implementation of terminals.
FIG. 1 illustrates a prior art design for implementing a centralized audio conference service. The teleconferencing system comprises a conference bridge CB and several terminals UE that communicate with it. Each terminal UE receives the terminal user's speech by a microphone and encodes the speech signal with a speech codec known per se. The encoded speech is transmitted to the conference bridge CB, which decodes the speech signal from the received signal. The conference bridge CB combines the speech signals received from different terminals in an audio processing unit APU using a prior art processing method, after which the combined signal comprising several speech signals is encoded by a speech codec known per se and transmitted back to the terminals UE, which decode the combined speech signal from the received signal. An audible audio signal is produced from the combined speech signal by loudspeakers or headphones. To avoid harmful echo phenomena, the audio signal transmitted to the conference bridge by a terminal is typically removed from the combined audio signal to be transmitted to that terminal.
The combined signal is produced in the conference bridge typically as a single-channel (monophonic) audio signal or as a two-channel (stereophonic) audio signal. In the conference bridge, a spatial effect, known as spatialization, can be created artificially in a two-channel audio signal. In that case the audio signal is processed to give the listeners the impression that the conference call participants are at different locations in the conference room. In that case the audio signals to be transmitted on different audio channels differ from one another. When a single-channel audio signal is used, all speech signals (i.e. the combined signal) are transmitted as mixed on the same audio channel.
Regardless of whether only one or more audio channels are used, typically only one data transmission channel is used for transmitting speech. In a video conference, for example, the same data transmission channel can also be used for transmitting video images. To minimize the bandwidth used on the data transmission path, the audio signals to be transmitted between the conference bridge and the terminals are encoded/decoded by using a speech or audio codec supported by the system.
In this application the speech or audio codec refers to the means for encoding analogue or digital non-compressed audio information, typically speech, into digital audio/speech parameters before channel coding that may take place before the transmission path. Correspondingly, when audio information is received, the speech or audio codecs comprise means for converting audio/speech parameters that typically arrive from channel decoding into digital non-compressed audio information, which can be converted into analogue audio information in reproduction. Thus different speech or audio codecs or codec modes can be used on different audio channels of a stereophonic audio signal, for example, but conceptually these constitute one audio codec. Thus the term codec refers both to audio codecs in the traditional sense, such as different waveform codecs, and to speech codecs used in various systems.
Prior art teleconferencing systems, which are variations of the basic design presented above, are described e.g. in U.S. Pat. No. 6,125,115, U.S. Pat. No. 5,991,385 and WO 99/53673.
A problem related to these solutions is that the systems are inflexible in respect of different speech situations that appear during a conference call, particularly in respect of optimization of the speech coding used. Certain speech codecs with a low bit rate use a narrow bandwidth but cannot often perform good speech coding in various speech situations. On the other hand, speech codecs with a high bit rate or waveform codecs which are capable of quality speech coding use a lot of bandwidth. This inevitably results in non-optimal utilization of the bandwidth used in data transmission, which is a significant disadvantage, especially in packet-switched networks with a limited bandwidth.