Virtual rendering of spatial audio over a pair of speakers commonly involves the creation of a stereo binaural signal that represents the desired sound arriving at the listener's left and right ears and is synthesized to simulate a particular audio scene in three-dimensional (3D) space, containing possibly a multitude of sources at different locations. For playback through headphones rather than speakers, binaural processing or rendering can be defined as a set of signal processing operations aimed at reproducing the intended 3D location of a sound source over headphones by emulating the natural spatial listening cues of human subjects. Typical core components of a binaural renderer are head-related filtering to reproduce direction dependent cues as well as distance cues processing, which may involve modeling the influence of a real or virtual listening room or environment. One example of a present binaural renderer processes each of the 5 or 7 channels of a 5.1 or 7.1 surround in a channel-based audio presentation to 5/7 virtual sound sources in 2D space around the listener. Binaural rendering is also commonly found in games or gaming audio hardware, in which case the processing can be applied to individual audio objects in the game based on their individual 3D position. With the growing importance of headphone listening and the additional flexibility brought by object-based content (such as the Dolby® Atmos™ system), there is greater opportunity and need to have the mixers create and encode specific binaural rendering metadata at content creation time to maintain the spatial cues of the original content.
During headphone playback, matching the response at a person's ear drum to a free field response is important for recreating the perception of spatiality and obtaining the correct timbre. Unlike loudspeakers, headphones are generally not designed to have a flat frequency response but instead should compensate for the spectral coloration caused by the sound path to the ear. For correct headphone reproduction it is essential to control the sound pressure at the listener's ears, and there is no general consensus about the optimal transfer function and equalization of headphones. A great multitude of different headphone models can be derived to model playback through different types of headphones (e.g., open, closed, earbuds, in-ear monitors, hearing aids, and so on), and different directional placements. The creation and distribution of such models can be a challenge in environments that feature different audio playback scenarios, such as different client devices (e.g., mobile phones, portable or desktop computers, gaming consoles, and so on), as well as audio content (e.g., music, games, dialog, environmental noise, and so on).
What is needed, therefore, is an equalization system that enhances the perceptual quality and spatial representation of object-based audio content for playback through headphones. What is further needed is a system for efficiently defining and distributing headphone models for a variety of different headphone types and listening environments.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.