The audible rendering of audio content from a digital representation comprises a known area of endeavor. In some application settings the digital representation comprises a complete corresponding bandwidth as pertains to an original audio sample. In such a case, the audible rendering can comprise a highly accurate and natural sounding output. Such an approach, however, requires considerable overhead resources to accommodate the corresponding quantity of data. In many application settings, such as, for example, wireless communication settings, such a quantity of information cannot always be adequately supported.
To accommodate such a limitation, so-called narrow-band speech techniques can serve to limit the quantity of information by, in turn, limiting the representation to less than the complete corresponding bandwidth as pertains to an original audio sample. As but one example in this regard, while natural speech includes significant components up to 8 kHz (or higher), a narrow-band representation may only provide information regarding, say, the 300-3,400 Hz range. The resultant content, when rendered audible, is typically sufficiently intelligible to support the functional needs of speech-based communication. Unfortunately, however, narrow-band speech processing also tends to yield speech that sounds muffled and may even have reduced intelligibility as compared to full-band speech.
To meet this need, bandwidth extension techniques are sometimes employed. One artificially generates the missing information in the higher and/or lower bands based on the available narrow-band information as well as other information to select information that can be added to the narrow-band content to thereby synthesize a pseudo wide (or full) band signal. Using such techniques, for example, one can transform narrow-band speech in the 300-3400 Hz range to wide-band speech, say, in the 100-8000 Hz range. Towards this end, a critical piece of information that is required is the spectral envelope in the high-band (3400-8000 Hz). If the wide-band spectral envelope is estimated, the high-band spectral envelope can then usually be easily extracted from it. One can think of the high-band spectral envelope as comprised of a shape and a gain (or equivalently, energy).
By one approach, for example, the high-band spectral envelope shape is estimated by estimating the wideband spectral envelope from the narrow-band spectral envelope through codebook mapping. The high-band energy is then estimated by adjusting the energy within the narrow-band section of the wideband spectral envelope to match the energy of the narrow-band spectral envelope. In this approach, the high-band spectral envelope shape determines the high-band energy and any mistakes in estimating the shape will also correspondingly affect the estimates of the high-band energy.
In another approach, the high-band spectral envelope shape and the high-band energy are separately estimated, and the high-band spectral envelope that is finally used is adjusted to match the estimated high-band energy. By one related approach the estimated high-band energy is used, besides other parameters, to determine the high-band spectral envelope shape. However, the resulting high-band spectral envelope is not necessarily assured of having the appropriate high-band energy. An additional step is therefore required to adjust the energy of the high-band spectral envelope to the estimated value. Unless special care is taken, this approach will result in a discontinuity in the wideband spectral envelope at the boundary between the narrow-band and high-band. While the existing approaches to bandwidth extension, and, in particular, to high-band envelope estimation are reasonably successful, they do not necessarily yield resultant speech of suitable quality in at least some application settings.
In order to generate bandwidth extended speech of acceptable quality, the number of artifacts in such speech should be minimized. It is known that over-estimation of high-band energy results in annoying artifacts. Incorrect estimation of the high-band spectral envelope shape can also lead to artifacts but these artifacts are usually milder and are easily masked by the narrow-band speech.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present invention. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. It will also be understood that the terms and expressions used herein have the ordinary technical meaning as is accorded to such terms and expressions by persons skilled in the technical field as set forth above except where different specific meanings have otherwise been set forth herein.