The purpose of audio monitoring is to evaluate audio presentations in a neutral way to ensure good translation to other reproduction systems. The head and outer ear shapes with head movements, the main localization mechanisms of our auditory system, provide our wonderful ability to localize sound sources and enable loudspeaker monitoring to work. Headphones break the link to these natural mechanisms we have acquired over our lifetime. Because of these reasons on-ear and in-ear headphones have not been the best choice for monitoring. Normal headphones make it difficult to set levels, pan sound locations and equalise important sources, like the human voice or tonal instruments, because headphones do not have well-controlled frequency responses in the midrange and headphone-to-headphone sound character variation is large. This matter is complicated by individual differences between persons. What you hear on headphones can be quite different from what the other persons hear, even with the same set of headphones. These characteristics are entirely different in comparison to good in-room monitoring loudspeaker systems. Work done using good loudspeaker monitoring systems translates precisely to other loudspeakers and sounds the same for all listeners, and also works well on headphone reproduction.
To allow for monitoring using headphones, new solutions are needed. The present disclosure aims to provide a reliable path to enable stereo, surround and immersive audio monitoring using headphones. In embodiments of the present disclosure, calculations on how a user's head, external ear and upper body affect and colour audio arriving from any given direction. This effect is called the Head-Related Transfer Function (HRTF). At least some of the embodiments of the present disclosure provide the user's unique personal HRTF in the SOFA file format. The SOFA (Spatially Oriented Format for Acoustics) file format has been standardized by the Audio Engineering Society (AES) and is widely accepted and supported by audio software.
There are methods of offering HRTFs that use data not uniquely measured from the person in question. Such data may come from a mannequin or a dummy head. Typically, these solutions do not result in the best quality, i.e. the generated HRTF does not match the user anatomy very well. It is understood that a poor quality HRTF is not useful for the user and can actually result in lower fidelity, such as HRTFs including sound colourations and localization inaccuracy as well as localization errors.
There is also data available in databases and originally measured from totally different persons. Previous methods have comprised devices for selecting the best match in such databases. Such selection is usually based on measuring a set of dimensions in the person, such as the size of the head and dimensions of the ear, commonly called anthropometrics. Anthropometric data of the target person can be matched to data from other persons in a database, with the intention of finding the best match. The assumption is that this would result in selecting a HRTF most likely to create correct presentation of audio for the given person. Unfortunately, such methods do not show very good performance in reality. Often problems with unwanted coloration, localization errors and lack or creating good externalization are seen, but this typically does not result in reliable rendering as there are still significant individual differences and perfect selection methods do not exist yet.
The present invention aims to solve the problems as well as the drawbacks of the solutions presented herein as well as providing advantageous effects as disclosed herein.