Reproduction of a three-dimensional (“3D”) sound of a sound field using loudspeakers is vulnerable to perceptible distortion due to, for example, spectral coloration and other sound-related phenomena. Conventional devices and techniques to generate three-dimensional binaural audio have been generally focused on resolving the issues of cross-talk between left-channel audio and right-channel audio. For example, conventional 3D audio techniques, such as ambiophonics, high-order ambisonics (“HOA”), wavefield synthesis (“WFS”), and the like, have been developed to address 3D audio generation. However, some of the traditional approaches are suboptimal. For example, some of the above-described techniques require additions of spectral coloration, the use of a relatively large number of loudspeakers and/or microphones, and other such limitations. While functional, the traditional devices and solutions to reproducing three-dimensional binaural audio are not well-suited for capturing fully the acoustic effects of the environment associated with, for example, a remote sound field.
Accurate reproduction of three-dimensional binaural audio typically requires that a listener be able to perceive the approximate locations of vocal persons located in a remote sound field. For example, if an audio reproduction device is disposed at one end of a long rectangular table at one location, a listener at another location ought to be able to perceive the approximate positions in the sound field through the reproduced audio. However, conventional techniques of determining locations of the vocal persons in the sound field are generally sub-optimal.
One conventional approach, for example, relies on the use of using video and/or image detection of the persons to determine approximate points in space from which vocalized speech originates. There are a variety of drawbacks to using visual information to determine the position of the persons in the sound field. First, image capture devices typically require additional circuitry and resources, as well as power, beyond that required for capturing audio. Thus, the computational resources are used for both video and audio separately, sometime requiring the use of separate, but redundant circuits. Second, the capture of visual information and audio information are asynchronous due to the differing capturing devices and techniques. Therefore, additional resources may be required to synchronize video-related information with audio-related information. Third, image capture devices may not be well-suited for range-finding purposes. Moreover, typical range-finding techniques may have issues as they usually introduce temporal delays, and provide for relatively coarse spatial resolution. In some instances, the introduction of temporal delay can consume power unnecessarily.
FIG. 1 depicts an example of a conventional range-finding technique that introduces temporal delays. Consider that diagram 100 illustrates a current for driving an ultrasonic transducer for purposes of range-finding. As shown, conventional techniques for generating a drive current 102 includes switching, for example, from one signal characteristic to another signal characteristic. This switching introduces a temporal delay 104 as the transducer “rings down” and then “rings up” to the next signal characteristic. Such delays may limit the temporal and/or spatial resolution of this range-finding technique. Further, switching the signal characteristic from one to the next represents lost energy that otherwise may not be consumed.
Thus, what is needed is a solution for audio capture and reproduction devices without the limitations of conventional techniques.