Audio coding standards such as MPEG-Surround and Spatial Audio Object Coding (SAOC) use a perceptual parameterization of the spatial image to reduce the bit rate for transmission and storage. For example, in MPEG-Surround a multi-channel audio signal can be encoded as a downmix signal along with a set of spatial parameters. The spatial parameters describe auditory spatial cues of time-frequency tiles of the audio signal. With the spatial parameters, the image can be reconstructed by restoring the original auditory spatial cues when converting the downmix up to the multi-channel signal at the decoder. The spatial cues determine the perceived location and width of the perceived image for each time-frequency tile.
Similarly, in SAOC independent audio objects are encoded as a downmix signal along with parameters that describe for each time-frequency tile which object is most active. The decoder can then render the objects at different locations by generating the corresponding auditory spatial cues in the multi-channel upmix.
Although the systems above apply measures to represent spatial cues, these systems do not address three-dimensional audio reproduction. Accordingly, these traditional systems may not effectively represent the distance (depth) and height of sound sources.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.