The demand for teleconferencing systems continues to steadily increase due to the importance placed on such technology in business and personal use. In 3-dimensional (3D) video conferencing, a user visually perceives a shared 3D space with the remote participants. The shared space can take the form of virtual space augmented from the physical space the user occupies. Correspondingly, the audio signal needs to match the visual perception. This applies even without a video component to the conferencing system. For example, if a video wall shows the other party on the other side of the wall, the voice signals of the other party should be perceived as coming from the other side as well. For different remote participants, voices should be perceived as though originating from distinct locations of the remote participants is located in the visual scene. That is, when the remote participant moves around, the audio signal should also match (follow) the remote participant movement.
While spatial audio has shown promise in improving the listener experience of conferencing audio, reproducing spatial audio realistically remains a significant challenge with a limited (small) number of loudspeakers. In many situations where there is more than one local participant, it is necessary to reproduce the audio using loudspeakers. Practical systems comprising a few loudspeakers, known as stereophonic systems, suffer a sweet spot problem where the accurate production of spatial audio is only possible over a very small area. Notwithstanding that in certain situations where the total number of participants is limited such that each remote participant can be assigned to an individual loudspeaker, the speakers are typical non-movable. Thus, when a remote participant (listener) moves even slightly, the sound image becomes unstable. Moreover, this approach does not scale well to typical situations where the number of participants exceeds the number of loudspeakers.