1. Field of the Invention
The present invention relates to sound signals encoded over multiple speakers to create the perception of specific spatial properties.
2. Background Art
Interest in rendering sound signals with 3-D spatial properties has been motivated by many applications including enhancing intelligibility in teleconferencing systems, navigation systems for the visually impaired, and enhancing the sense of immersion in virtual environments. The challenges for rendering sound with a limited number of speaker elements involve creating stimuli that are perceived as realistic with accurate spatial properties. In addition, the complexity of the rendering algorithms present challenges for fast and efficient implementations.
The basic problem of spatial audio rendering is creating the perception that sound is coming from a location in a space where a speaker cannot be located. For example, if presenting a holographic representation of a person (e.g., as in the (currently) fictional but generally illustrative example of a “holodeck” from the American science fiction entertainment series and media franchise, “Star Trek”), it is desired to create the perception that a voice is coming from the mouth of the holographic representation, even though it is not possible to locate a physical speaker at the location of the mouth of the holographic representation. It is desired, however, to give the listener the impression that sound emanates from the mouth of the holographic representation.
In this example, the location of the listener where the perception of the sound emanating from the holographic representation is received is known as the “sweet spot,” which, as used herein, means a region where the listener's perception of the rendered sound is correct.
One of the more popular approaches to rendering sound is wave-field synthesis (WFS). It is capable of accurately reconstruct a pressure field within a large area of interest with moderate processing power. The absence of a “sweet spot” makes it well suited for creating realistic spatial audio impressions for large audiences or listeners moving around in a large area. Unfortunately, it requires a rather high number of speakers. Another drawback of the method is that it is based on Green's Second Theorem, and the field can be reconstructed only either inside or outside a closed boundary containing the equivalent sources. When a source is located within a reconstruction domain (i.e., an “immersive environment”), a focused source must be located between the listener and the speakers to ensure correct perception. As an example illustrating this limitation, imagine a virtual videoconferencing environment, where the remote person's avatar is standing right beside the local participant. Methods based on Green's Second Theorem cannot render this source without additional modifications increasing computational complexity and invalidating correctness of the reconstructed field in some regions of the reconstruction domain. WFS can also require a relatively large number of speakers. This can be prohibitive for immersive virtual environments, especially if they need to be portable or set up in many smaller rooms. In addition, the reproduction of the sound field at every point in an immersive environment with only a few listeners is often not necessary.
Another popular approach to rendering sound is Dolby 5.1, which delivers spatial and ambient sound to a listener's vicinity (the “sweet spot”) using a regularly-spaced setup of five loudspeakers and one subwoofer. Dolby 5.1 creates only directional perception, but distances are not accurately reproduced and the Doppler effect (moving a sound in 3-dimensional space) must be either recorded or manually reproduced by performing a frequency shift.