1. Technical Field
The techniques described herein relate generally to audio signal processing and reproduction, and in particular to directional encoding and decoding enabling reproduction of sounds positioned in three-dimensional (3D) space using a two-dimensional (2D) arrangement of speakers.
2. Discussion of the Related Art
Various techniques exist for reproducing sound in a manner that conveys directional information about the position from which the sound originates with respect to a listener. Some techniques attempt to reproduce sounds for a listener in a manner that can simulate sound originating at any point in 3D space. As a result, the listener may perceive sound as coming from one or more selected positions in 3D space, such as above, below, in front of, behind or to the side of the listener. Some techniques use speakers positioned around the listener and above and below the listener to achieve the desired sound positioning effect.
Several conventional techniques for 3D positioning and reproducing of sounds exist, including: 1) binaural synthesis using head-related transfer function (HRTF) based transaural methods; 2) amplitude panning and equalization filters; and 3) ambisonics encoding and decoding.
Conventional binaural techniques can provide 3D audio reproduction using the HRTF and crosstalk cancellation method. However, conventional binaural techniques have certain drawbacks. Binaural methods are computationally demanding, and may require significant computing power. HRTFs can only be measured at a set of discrete positions around the head. Designing a binaural system which can faithfully reproduce sounds from all directions can be highly challenging and require significant computing power. The sound perceived is highly dependant on the shape of the head, pinnae and torso of the listener. If the listener's head, pinnae and torso are not identical to the dummy head used for the HRTF, the fidelity of reproduction can be compromised. In addition, binaural techniques can be highly sensitive to the position of the listener, and may only provide suitable performance at one position (known as a “sweet spot”) due to the positional dependency of crosstalk cancellation.
Amplitude panning and equalization filters can position a sound in a multichannel playback system by weighting an audio input signal using a set of amplifiers that feeds loudspeakers individually. Equalization filters are used to virtually position a sound in the vertical plane. These techniques may provide for 3D audio reproduction, but have certain drawbacks. For example, they may have difficulty providing good localization in the center front of the speaker system. They can also be position dependent and sensitive to the sweet spot. They can require position dependent amplitude selection for each channel and elevation dependant equalization filtering that can be computationally demanding. Another drawback is that the speaker positions need to be known at the encoder phase itself. This constrains the end user as the speaker setup is not configurable after encoding. Another disadvantage is that a large number of channels may be required to faithfully reproduce sounds from all directions.
Ambisonics first order encoding and decoding, also known as B-format encoding and decoding, is widely accepted as a very efficient way of positioning sounds in 3D space. Ambisonics has quite a few advantages over the other two approaches. For example, it is computationally less demanding. The speaker layout does not need to be known at the encoder phase and the encoded signal can work with a variety of speaker array configurations. Conventional ambisonics needs only 3 channels (WXY) for reproduction of planar (2D) sounds and 4 channels (WXYZ) for reproduction of full sphere (3D) sounds. Ambisonics can provide good localization at any position around the listener. Ambisonics is also independent of the listener's features (head, pinnae, torso), and can be less sensitive to the position of the listener. All of the speakers can be used for reproducing a sound, and hence sound positioning can be more accurate.
There are two types of conventional first order ambisonics:
NumberAmbisonics soundfieldHorizontalVerticaloftypeorderorderchannelsChannelsHorizontal/2D/planar103WXYFull-sphere/3D/periphonic114WXYZ
Planar ambisonics (also called horizontal or 2D ambisonics) is designed for playback of 2D sound using a 2D arrangement of speakers. Full sphere ambisonics (also called 3D or periphonic ambisonics) is designed for playback of 3D sound using a 3D arrangement of speakers. One problem with full sphere ambisonics is that it can be difficult to achieve a suitable 3D arrangement of speakers in the home or similar environments. It can be difficult to mount and wire speakers in suitable positions above the listener's head to achieve the desired 3D sound effect, and a specialized speaker installation may be required.