Virtual rendering of spatial audio over a pair of speakers commonly involves the creation of a stereo binaural signal that represents the desired sound arriving at the listener's left and right ears and is synthesized to simulate a particular audio scene in three-dimensional (3D) space, containing possibly a multitude of sources at different locations. For playback through headphones rather than speakers, binaural processing or rendering can be defined as a set of signal processing operations aimed at reproducing the intended 3D location of a sound source over headphones by emulating the natural spatial listening cues of human subjects. Typical core components of a binaural renderer are head-related filtering to reproduce direction dependent cues as well as distance cues processing, which may involve modeling the influence of a real or virtual listening room or environment. One example of a present binaural renderer processes each of the 5 or 7 channels of a 5.1 or 7.1 surround in a channel-based audio presentation to 5/7 virtual sound sources in 2D space around the listener. Binaural rendering is also commonly found in games or gaming audio hardware, in which case the processing can be applied to individual audio objects in the game based on their individual 3D position.
Traditionally, binaural rendering is a form of blind post-processing applied to multichannel or object-based audio content. Some of the processing involved in binaural rendering can have undesirable and negative effects on the timbre of the content, such as smoothing of transients or excessive reverberation added to dialog or some effects and music elements. With the growing importance of headphone listening and the additional flexibility brought by object-based content (such as the Dolby® Atmos™ system), there is greater opportunity and need to have the mixers create and encode specific binaural rendering metadata at content creation time, for instance instructing the renderer to process parts of the content with different algorithms or with different settings. Present systems do not feature this capability, nor do they allow such metadata to be transported as part of an additional specific headphone payload in the codecs.
Current systems are also not optimized at the playback end of the pipeline, insofar as content is not configured to be received on a device with additional metadata that can be provided live to the binaural renderer. While real-time head-tracking has been previously implemented and shown to improve binaural rendering, this generally prevents other features such as automated continuous head-size sensing and room sensing, and other customization features that improve the quality of the binaural rendering to be effectively and efficiently implemented in headphone-based playback systems.
What is needed, therefore, is a binaural renderer running on the playback device that combines authoring metadata with real-time locally generated metadata to provide the best possible experience to the end user when listening to channel and object-based audio through headphones. Furthermore, for channel-based content it is generally required that the artistic intent be retained by incorporating audio segmentation analysis.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.