Traditional DSP sound personalization methods often rely on equalization techniques that apply compensatory frequency gain according to a user's hearing profile (see e.g. U.S. Pat. Nos. 9,138,178, 9,468,401 9,680,438, 9898248). Typically, a pure tone threshold (PTT) hearing test is employed to identify frequencies in which a user exhibits raised hearing thresholds. Based on the audiogram data, the frequency output is then modulated accordingly. In this regard, the approach to augmenting the sound experience for the user is one dimensional. The gain may enable the user to recapture previously unheard frequencies, however they may subsequently experience loudness discomfort. Listeners with sensorineural hearing loss typically have similar, or even reduced, discomfort thresholds when compared to normal hearing listeners, despite their hearing thresholds being raised. To this extent, their dynamic range is narrower and simply adding EQ gain would be detrimental to their hearing health in the long run (FIG. 1).
Dynamic range compression (DRC) can be used to address this issue by amplifying quieter sounds while reducing the volume of loud sounds, thus narrowing the dynamic range of the audio. However, this could pose a problem, as a low frequency rumble could prevent amplification of a high frequency sound of interest. For this reason, hearing aid processors employ wide dynamic range compression where the faintest sounds are amplified considerably, but where high-intensity sounds are not. To this extent, conventional hearing aids are designed for use in real world situations where a wide dynamic range of sounds are relevant to the listener, i.e. the listener wants to make sense of sonic information such as a loud-voiced person speaking in front of them, while at the same time they want to be able to detect the faint sound of a car approaching them from a distance while walking down the street. Although this works for practical, real world matters, audio content consumed on mobile devices, or other similar devices, have very different signal statistics to the sounds that someone will encounter in their daily life, so a different processing strategy is required to provide the listener with a beneficial sound personalization experience.
The ability to digitally recreate the functional processing of healthy human hearing would enable a more natural and clear listening experience for a hearing impaired (HI) user. Only until recently has the physics of the human ear been well characterised. The human ear pre-processes sounds into a format that is optimal for transmission to the brain to make sense of the sonic environment. The pre-processing can be modelled as a number of hierarchical signal processes and feedback loops, many of which are non linear, resulting in a complex, non-linear system. Although hearing loss typically begins at higher frequencies, listeners who are aware that they have hearing loss do not typically complain about the absence of high frequency sounds. Instead, they report difficulties listening in a noisy environment and in hearing out the details in a complex mixture of sounds, such as in an audio stream. In essence, off frequency sounds more readily mask information with energy in other frequencies for HI individuals—music that was once clear and rich in detail becomes muddled. This is because music itself is highly self-masking.
As hearing deteriorates, the signal-conditioning capabilities of the ear begin to break down, and thus HI listeners need to expend more mental effort to make sense of sounds of interest in complex acoustic scenes (or miss the information entirely). A raised threshold in an audiogram is not merely a reduction in aural sensitivity, but a result of the malfunction of some deeper processes within the auditory system that have implications beyond the detection of faint sounds.
Recent studies attempted to better model the physics of the human ear, modelling the interconnection of the basilar membrane, the medial olivocochlear complex and the inner and outer hair cells within the middle ear. Building on hearing aid format technology, Clark et al. (2012) developed an algorithm to better model human hearing, mimicking the attenuation effect of the medial olivocochlear to the basilar membrane, which data from the aforementioned suggests might improve speech-in-noise robustness (see: Clark et al., A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise. Journal of the acoustical society of America, Volume 132, issue 3, pages 1535 to 1541, 2012). This result is achieved by implementing a delayed feedback attenuation control (DFAC) to a dual resonance non-linear (DRNL) algorithm within a spectrally decomposed system (for DRNL see: E. Lopez-Poveda and R. Meddis. A human nonlinear cochlear filterbank. Journal of the acoustical society of America, Volume 110, issue 6, Pages 3107 to 3118, 2001). The DRNL algorithm includes instantaneous dynamic range compression.
However, this algorithm served merely as a framework for modeling the hearing system and was not specifically designed for sound augmentation. To this extent, it has some drawbacks on the subjective hearing experience caused by the lack of control over the distortion products. These include a reduced ability to control distortion, a limited frequency resolution and phase distortion that can cause temporal smearing of sound (if used in combination with narrowband filters) and therefore reduced clarity. Namely, although this algorithm could potentially improve some aspects of real world use cases if used by hard of hearing users, it would fail to improve the listening experience for a broader category of listeners in the context of audio. Accordingly, it is the object of this invention to create an improved, biologically-inspired DSP that provides a listener with beneficial sound personalization.