Sound is central to the interaction of humans with their environment. As a result, a major technological objective has been to control the sound in a particular physical environment for purposes such as communication or entertainment. At the current state of art, simply reproducing the sound of a single source is straightforward. However, the reproduction or creation of complex audio scenarios is still difficult. This is especially true for the case of rendering various individual three-dimensional (3D) sound environments over multiple listening areas simultaneously, which generally requires a large number of loudspeakers with 3D setup and results in high computational complexity.
The natural solution to create multiple sound environments independently is to create multiple sets of bright and quiet zones over the selected regions, so that the inter-zone sound leakages can be minimized. This so-called multi zone sound field reproduction has widely received the attention of researchers.
There is an interest in reproducing various 3D sound environments over multiple listening areas using a single two-dimensional (2D) speaker array. This is achieved by performing at least one of amplifying, attenuating, and delaying processes on each of the replicated source signals based on the predetermined filters for each of the loudspeakers. The sound field in a space is normally modeled as a linear and time-invariant system. The actual sound field sa(x,t) at a point x at time t can be written as a linear function of the signal transmitted by the source s(t). For a fixed source, the position-dependent acoustic impulse response h(x; t) can be modeled at each time t:sa(x;t)=h(x,t)*s(t).Taking the Fourier transform with respect to wave number k, the acoustic transfer function H(x; k) is defined as the complex gain between the frequency domain quantities of source driving signal s(k) and the actual sound field Sa(x;k):Sa(x,k)=H(x,k)s(k).As mentioned above, the source driving signal s(k) is derived by amplifying, attenuating, and delaying the input signal or filtering the latter with head-related transfer function (HRTF) spectrum cues. HRTF is a frequency response that characterizes how an ear receives a sound from a point in space; it is a transfer function, describing how a sound from a specific point will arrive at the ear (generally at the outer end of the auditory canal).
Current surround sound standards (e.g. 5.1/10.2 surround) are characterized by a single listener location or sweet spot where the audio effects work best, and present a fixed or forward perspective of the sound field to the listener at this location; these works are incapable of providing multiple individual sound environments over arbitrary listening zones. There are some existing multi zone sound rendering systems based on sound field synthesis approaches (e.g. higher order ambisonics (HOA) based methods, planarity control methods, and spectral division methods). However, these works are restricted to virtual source localization on the horizontal plane.
To achieve the sensation of 3D elevated sources (or virtual sources below the horizontal plane) in existing systems, additional loudspeakers in a third dimension or changing the reproduction set-up to 3D are generally needed (e.g., 22.2 surround and 3D spherical loudspeaker arrays). However, the 3D array with a relatively large number of speakers is not practical to employ in real-world. Additionally, the computational complexity also increases significantly as the number of speaker channels goes up.