The present invention relates to bandwidth extension, and in particular, to blind bandwidth extension.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
With the increasing popularity of mobile devices (i.e., smartphones, tablets) and online music streaming services (i.e., Apple Music, Pandora, Spotify, etc.), the capability of providing high quality audio content with minimum data requirement becomes more important. To ensure a fluent user experience, the audio content could be heavily compressed and lose its high-band information during the transmission. Similarly, users may possess legacy audio content that was heavily compressed (e.g., due to past storage concerns that may no longer be applicable). This compression process may cause degradation to the perceptual quality of the content. An audio bandwidth extension method is to address this problem and restore the high-band information to improve the perceptual quality. In general, audio bandwidth extension can be categorized into two types of approaches: Non-blind and Blind.
In Non-blind bandwidth extension, the band-limited signal is reconstructed at the decoder with side information provided. This type of approach can generate high quality results since more information are available. However, it also increases the data requirement and might not be applicable in some use cases. The most well-known method in this category is Spectral Band Replication (SBR). SBR is a technique that has been used in the existing audio codecs such as MPEG-4 (Motion Picture Experts Group) High-Efficiency Advanced Audio Coding (HE-AAC). SBR can improve the efficiency of the audio coder at low-bit rate by encapsulating the high frequency content and recreating it based on the transmitted low frequency portion with high-band information. Another technique, Accurate Spectral Replacement (ASR), explores a similar idea with a different approach. ASR uses the sinusoidal modeling technique to analyze the signal at the encoder, and re-synthesize the signal at the decoder with transmitted parameters and bandwidth extended residuals. SBR, being a simple and efficient algorithm, still introduces some artifacts to the signals. One of the most obvious issues is the mismatch in the harmonic structures caused by the process of the band replication to create the missing high frequency content. To improve the patching algorithm, a sinusoidal modeling based method was proposed to generate the missing tonal components in SBR. Another approach is to use a phase vocoder to create the high frequency content by pitch shifting the low frequency part. The other approaches, such as offset adjustment between the replicated spectrum or a better inverse filtering process, have also been proposed to improve the patching algorithm in SBR.
In Blind bandwidth extension, the band-limited signal is reconstructed at the decoder without giving any side information. This type of approach mainly focuses on general improvement instead of faithful reconstruction. One approach is to use a wave-rectifier to generate the high frequency content, and use different filters to shape the resulting spectrum. This approach has a lower model complexity and does not require a training process. However, the filter design becomes crucial and can be difficult to optimize. The other approaches, such as linear predictive extrapolation and chaotic prediction theory, predict the missing values without any training process. For more complex approaches, machine learning algorithms have been applied. For example, envelope estimation using Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) and Neural Network have been proposed. These approaches in general require a training phase to build the prediction models.
For methods focusing on blind speech bandwidth extension, Linear Prediction Coefficients (LPC) is commonly used to extract the spectral envelope and excitation from the speech. A codebook can then be used to map the envelope or excitation from narrowband to wideband. Other approaches, such as linear mapping, GMM and HMM, have been proposed to predict the wide-band spectral envelopes. Combing the extended envelope and excitation, the bandwidth extended speech can then be synthesized through LPC.