There are several known approaches for providing personalized spatial audio to multiple listeners at the same time. A first group of methods uses local sound field synthesis (SFS) approaches, such as (higher order) ambisonics, wave field synthesis and techniques related to it, and a multitude of least squares approaches (e.g. pressure matching or acoustic contrast maximization). These techniques aim at reproducing a desired sound field in multiple spatially extended areas (audio zones).
A second group comprises binaural rendering (BR) or point-to-point (P2P) rendering approaches, e.g., binaural beamforming or crosstalk cancellation. Their aim is to generate the desired hearing impression by evoking proper interaural time differences (ITDs) and interaural level differences (ILDs) at the ear positions of the listeners. Thereby, virtual sources are perceived at desired positions. As opposed to SFS, where the desired sound field is reproduced in spatially extended areas, only the ear positions are considered in case of BR.
Both approaches (BR and SFS) have drawbacks (limitations) and advantages. A fundamental drawback of BR systems is the limited robustness with respect to movements or rotations of the listeners' heads. This is due to the fact that the sound field is inherently optimized for the ear positions only, i.e., for a specific head position and orientation.
In case of SFS, many loudspeakers should ideally surround the entire listening area such that virtual sources can be synthesized for all directions. Furthermore, SFS is generally more affected by spatial aliasing, since a proper sound field needs to be generated in an entire area rather than at single points (ear positions) only. Similarly, it is challenging to properly synthesize the sound field with SFS for very low frequencies, which is again due to the fact that the sound field must be synthesized in a spatially extended area, whereas for BR the sound field needs to be controlled at the ear positions only. In return, SFS provides a much higher robustness with respect to movements/rotations of the listeners' heads, since the desired sound field is synthesized in spatially extended areas rather than evoking ITDs and ILDs at certain points in space. As a consequence, head rotations and small head movements do not deteriorate the hearing impression. Moreover, SFS is independent of the head-related transfer functions (HRTFs) of the listeners, which play a crucial role in sound perception and BR.