Bandlimited Audio Signals
Increasingly, audio signals, such as pod casts, are transmitted over networks, e.g., cellular networks and the Internet, which degrade the quality of the signals. This is particularly true for networks with suboptimal bandwidths.
Audio signals, such as music, are best appreciated at a full bandwidth. A low frequency response and the presence of high frequency components are universally understood to be elements of high quality audio signals. Quite often though, a wide frequency audio signal is not available.
Often audio signals are sampled at a low rate, thereby losing high frequency information. Audio signals can also undergo processing or distortion, which removes certain frequency regions. The goal of bandwidth expansion is to recover the missing frequency band information.
Most methods attempt to recover missing high frequency components when the signal is sampled at a low rate. However, recovering high frequency data is difficult. Typically, this information is lost and cannot be inferred. The problem of bandwidth expansion has hitherto been considered chiefly in the context of monophonic speech signals.
Typically, the bandwidth of telephonic speech signals only contain frequency components between 300 Hz and about 3500 Hz, the exact frequencies vary for landlines and mobile telephones, but are below 4 kHz in all cases. Bandwidth expansion methods attempt to fill in the frequency components below the lower cutoff and above the upper cutoff, in order to deliver a richer audio signal to the listener. The goal has been primarily that of enriching the perceptual quality of the signal, and not so much high-fidelity reconstruction of the missing frequency bands.
Data Insensitive Methods
The simplest methods for expanding the spectrum of an audio signal apply a memory-less non-linear function, such as a sigmoid function or a rectifier, to the signal, Yasukawa, “Signal Restoration of Broadband Speech using Non-linear Processing,” Proceedings of the European Signal Processing Conference (EUSIPCO), pp. 987-990, 1996. That has the property of aliasing low-frequency components into high frequencies.
Synthesized high-frequency components are rendered more natural through spectral shaping and other smoothing methods, and adding the synthetic components back to the original bandlimited signal. Although those methods do not make any explicit assumptions about the signal, they are only effective at extending existing harmonic structures in a signal and are ineffective for broadband sounds such as fricated speech or drums, whose spectral textures at high frequencies different from those at low frequencies.
Example-Driven Methods
The example-driven, approach attempts to derive unobserved frequencies in the audio signal from their statistical dependencies on observed frequencies. These dependencies are variously acquired through codebooks, coupled hidden Markov model (HMM) structures, and Gaussian mixture models (GMM), Enbom et al., “Bandwidth Expansion of Speech based on Vector Quantization (VQ) of Mel Frequency Cepstral Coefficients,” Proceedings IEEE Workshop on Speech Coding, pp. 171-173, 1999, Cheng et al., “Statistical Recovery of Wideband Speech from Narrowband Speech,” IEEE Trans, on Speech and Audio Processing, Vol, 2, pp. 544-548, October 1994, and Park et al., “Narrowband to Wideband Conversion of Speech using GMM Based Transformation,” Proceedings of the IEEE International Conference on Audios, Speech and Signal Processing, pp. 1843-1846, 2000.
The parameters are typically learned from a corpus of parallel broadband and narrow-band recordings. In order to acquire both, the spectral envelope and the finer harmonic structure, the signal is typically represented using linear predictive models that can be extended into unobserved frequencies and excited with the excitation of the original signal itself.
The following U.S. Patent Publications also describe bandwidth expansion: 20070005351 Method and system for bandwidth expansion for voice communications, 20050267741 System and method for enhanced artificial bandwidth expansion, 20040138876 Method and apparatus for artificial bandwidth expansion in speech processing, and 20040064324 Bandwidth expansion using alias modulation.
Limitations of Conventional Methods
Most of the above methods are directed primarily towards monophonic signals such as speech, i.e., audio signals that are generated by a single source and can be expected to exhibit consistency of spectral structures within any analysis frame.
For instance, the signal in any frame of speech includes the contributions of the harmonics of only a single pitch frequency. It may be expected that aliasing through non-linearities can correctly extrapolate this harmonic structure into unobserved frequencies. Similarly, the formant structures evident in the spectral envelopes represent a single underlying phoneme. Hence, it may be expected that one could learn a dictionary of these structures, which can be represented through codebooks, GMMs, etc., from example data, which could thence be used to predict unseen frequency components.
However, on more complex signals such as polyphonic music, which may contain multiple independent spectral structures from multiple sources, those methods are usually less effective for two reasons. Audio signals, such as music, often contain multiple independent harmonic structures. Simple extension of these structures through non-linearities introduces undesirable artifacts, such as spurious spectral peaks at harmonics of beat frequencies. In addition, spectral patterns from the multiple sources can co-occur in a nearly unlimited number of ways in the signal. It is impossible to express all possible combinations of these patterns in a single dictionary. Explicit characterization of individual sources through dictionaries is not practical because every possible combination of entries from these dictionaries must be considered during bandwidth expansion.
Therefore, it is desired to provide bandwidth expansion method that provides quality results for complex polyphonic signals as well as simple monophonic signals.