Audio conference systems allow a plurality of parties at a plurality of different terminals to communicate with one another. The plurality of terminals (which are also referred to as endpoints) may have different capabilities. By way of example, one or more terminals may be monophonic endpoints which capture a single mono audio stream. Examples for such monophonic endpoints are a traditional telephone, a device with a headset and a boom microphone, or a laptop computer with an in-built microphone. On the other hand, one or more terminals may be soundfield endpoints which capture a multi-channel representation of the soundfield incident at a microphone array. An example for a soundfield endpoint is a conferencing telephone equipped with a soundfield microphone.
An audio conference system which is configured to taken into account the soundfield information provided by a soundfield endpoint, e.g. for designing a conference scene, is referred to herein as a soundfield conference system. The present document addresses the technical problem of creating a conference scene for audio conference systems which comprise soundfield endpoints. In particular, the present document addresses the technical problem of mixing and/or multiplexing the audio signals coming from a plurality of endpoints, wherein at least one of the plurality of endpoints is a soundfield endpoint. A particular aspect of the present document is to provide schemes that integrate one or more soundfields together, so that a listener enjoys a perceptually continuous, natural, and/or enveloping teleconferencing experience in which he/she can clearly understand the speech, can identify who is talking at any particular time and/or can identify at which endpoint each talker is located.