Reproduction of a three-dimensional (“3D”) sound of a sound field using loudspeakers is vulnerable to perceptible distortion due to, for example, spectral coloration and other sound-related phenomena. Conventional devices and techniques to generate three-dimensional binaural audio have been generally focused on resolving the issues of cross-talk between left-channel audio and right-channel audio. For example, conventional 3D audio techniques, such as ambiophonics, high-order ambisonics (“HOA”), wavefield synthesis (“WFS”), and the like, have been developed to address 3D audio generation. However, some of the traditional approaches are suboptimal. For example, some of the above-described techniques require additions of spectral coloration, the use of a relatively large number of loudspeakers and/or microphones, and other such limitations. While functional, the traditional devices and solutions to reproducing three-dimensional binaural audio are not well-suited for capturing fully the acoustic effects of the environment associated with, for example, a remote sound field.
Further, there are drawbacks of using traditional three-dimensional binaural audio devices and solutions to reproduce audio originating from an audio source moving within a sound field, and to change the directivity of spatial audio responsive to a displacement of the audio source. One conventional approach, for example, relies on the use of video and/or image detection of the persons to identify audio sources. The capture of images of objects may lead to inadvertent identification of objects to which spatial audio is to be directed. For example, persons viewable through a conference room window may be detected by traditional three-dimensional binaural audio devices and solutions as a recipient of audio, while those persons are not intended to be deemed participants.
Thus, what is needed is a solution for audio capture and reproduction devices without the limitations of conventional techniques.