1. Field of the Invention
The present invention relates to the field of sound enhancement during reproduction of previously encoded audio signals to compensate for hearing impairment, environmental or other factors and, more specifically, to dynamically adjust the degree of sound enhancement. Dynamic adjustment includes, in some embodiments, balancing the benefits of sound enhancement against possible detriments resulting from increased audibility of encoding noise.
2. Description of Related Art
The invention presented here relates to the application of sound enhancement means to previously compressed audio signals. Before discussing the invention in detail the state of the art in audio compression and sound enhancement is reviewed.
Audio compression refers to the process of reducing the number of bits required to represent a digitally sampled audio signal. In general, the higher the number of bits used to represent an audio signal of a given duration (bit rate), the higher the signal quality. If more bits are available to represent a signal of a given duration, the additional bits can be used to sample the signal more densely (i.e., take more samples per time interval), which results in capturing a wider frequency range of the signal. The additional bits can also be used to characterize the signal samples more accurately (i.e., to reduce the quantization error), which results in a lower quantization noise floor. Either approach by itself or a combination of the two will result in a more faithful representation of the signal. However, it is known from psychoacoustic experimentation that a more faithful representation of the audio signal does not necessarily translate into higher fidelity. This is due to the fact that parts of most signals are inaudible to human listeners because they are “masked”, by other signal components. Exploiting this fact, a variety of audio-compression techniques have been developed that attempt to reduce the bit rate of an audio signal without affecting the perceived audio quality by selectively reducing the bit rate for signal components that are largely masked without affecting the bit rate of unmasked signal components. Examples of such audio-compression techniques are MPEG-1, Layer I, II, and III, Advanced Audio Coding (AAC; MPEG-2), AC-3 (Dolby) and Adaptive Transform Acoustic Coding (ATRAC; Sony). Typically, these techniques achieve their goal of reducing the overall bit rate without affecting fidelity by using fewer bits (i.e., by allowing a larger quantization error) for the representation of signal components that are estimated to have associated with them a high masked threshold while maintaining the original quantization accuracy for parts of the signal that are estimated to have associated with them a low masked threshold. Such an approach requires that the signal be represented in modular form. State of the art compressors parse the signal in time and represent different spectral regions separately. These separate signal parts are then quantized with different levels of accuracy (i.e., with different bit rates). The required degree of quantization accuracy in any signal part is determined by a psychoacoustic model that predicts whether quantization inaccuracies (the quantization noise) will be heard by the listener. Towards this end, the psychoacoustic model predicts the spectrum and temporal envelope of the broadband signal with the highest possible energy that is not audible when the signal that is to be coded is played simultaneously. In other words, the psychoacoustic model determines the highest-energy signal that is completely “masked” by the original signal. The spectrum of this signal is also known as the “spectral masked threshold” and the time course is known as the “temporal masked threshold”. Once the psychoacoustic model has predicted the masked threshold, the bit rates for the various signal parts are selected. The objective of this selection is to choose the lowest bit rate for which the quantization error, when expressed as the power of an error signal, is smaller than the masked threshold. With such a bit rate allocation the resulting quantization error is imperceptible and the goal of reducing the overall bit rate without affecting fidelity has been achieved.
The term “sound enhancement”, as used here, refers to the process of adjusting audio signals to compensate for an individual's altered sound perception. Sound perception may be altered (relative to that of a young, normally hearing listener in an anechoic quiet room) by hearing loss and/or the impact of environmental noise. To those skilled in the art it is well known that individuals with sensorineural hearing loss perceive the dynamics of an audio signal differently than listeners with normal hearing. (See, e.g., Minifie et al., Normal Aspects of Speech, Hearing, and Language (“Psychoacoustics”, Arnold M. Small, pp. 343-420), 1973, Prentice-Hall, Inc.). Specifically, listeners with sensorineural hearing impairment cannot perceive faint sounds whose level is high enough to be clearly heard by normally hearing listeners, but is too low to be heard by the hearing impaired. On the other end of the level range, high-level sounds are perceived as loud by the normally hearing and by the hearing impaired alike. Both effects are a manifestation of the reduced dynamic range of the impaired auditory system. A hearing-impaired individual's perception of signal dynamics can be altered to more closely resemble that of normally hearing listeners by the use of properly adjusted multi-band dynamic range compression. (Lippmann et al., “Study of Multichannel Amplitude Compression and Linear Amplification for Persons with Sensorineural Hearing Loss,” J. Acoust. Soc. Am. 69(2) (February 1981).) This kind of processing amplifies relatively faint audio signals to above an individual's elevated perception threshold, but does not amplify high-level signals, because those are already sufficiently loud. In summary, multi-band dynamic range compression maps the dynamic range of the signal onto the reduced (and warped) dynamic range of the hearing-impaired listener. By doing so the audibility of the desired sound, and hence the sound quality is greatly improved.
The compressor parameters, such as the compression threshold and the compression ratio, required to restore normal loudness perception depend on the amount of hearing loss and thus vary across frequency for hearing losses that are frequency dependent. Those skilled in the art are familiar with several methods of determining desired compressor settings for any given hearing loss profile (e.g., B. C. J. Moore, B. R. Glasberg and M. A. Stone: “Use of a loudness model for hearing aid fitting: III. A general method for deriving initial fittings for hearing aids with multi-channel compression”, British Journal of Audiology, 1999, Vol 33, p. 241-258).
Environmental factors also require compensation. Research suggests that the presence of broadband noise affects audio signals in much the same way as sensorineural hearing impairment in as much as it reduces the audibility of soft sounds without reducing the sensitivity to loud sounds (Braida et al., “Review of Recent Research on Multiband Amplitude Compression for the Hearing Impaired,” in: Studebaker, G. A., Bess, F. H., eds. The Vanderbilt Hearing-Aid Report, Upper Darby, Pa.: Monographs in Contemporary Audiology, 1982; 133-40). Therefore, travelers on planes, trains and automobiles, where various forms of background noises are encountered, also benefit from multi-band dynamic range compression.
Deliberately coloring a sound, for instance by applying a linear graphic equalizer, is another typical adjustment of an audio signal. Equalizing a sound may compensate for environmental conditions where the sound is reproduced or may suit the perception of the listener. Either equalizing a sound or adjusting it to compensate for listening impairment or environmental conditions can be described as applying a multi-band audio signal-modification profile, which describes how the signal is to be modified.
When a previously encoded audio signal is enhanced, (e.g., a decoded MP3 file is subjected to multi-band dynamic range compression) the masked threshold generated by the enhanced signal differs from the masked threshold that would have been generated by the original signal. Moreover, the signal enhancement algorithm works not only on the original signal but also “enhances” the quantization noise so that the quantization-noise spectrum differs from the quantization noise spectrum that would have been observed had the signal not been enhanced. Because the encoder assigned the quantization noise based on a masked threshold that differs from the masked threshold actually encountered and because the quantization noise spectrum differs from that intended by the encoder it is no longer guaranteed that the quantization noise remains inaudible. Accordingly, application of a signal-modification profile may make the perceived sound worse, instead of better, if too much encoding noise is promoted from a masked to an unmasked level. Whether the signal-modification profile is beneficial or not depends on the signal characteristics and will change rapidly over time.
Accordingly, there is an opportunity to introduce a dynamic signal-modification profile adjustment method and device that regulates the signal-modification profile to balance the positive effect of sound enhancement and the possible negative effect of increased quantization noise audibility. This method and device, which will be described in the following sections, will apply an auditory perception model during decoding and signal modification.