In most voice communication systems, the bandwidth is limited to a range from 0.3 kHz to 3.4 kHz. A speech bandwidth includes a voiced sound section and an unvoiced sound section, where sound quality of a reconstructed signal is deteriorated from that of an original signal due to the limited bandwidth. To reduce deterioration in the sound quality, a wideband speech receiving device has been suggested. A wideband speech having a bandwidth from 0.05 kHz to 7 kHz may cover all voice bandwidths including a voiced sound section and an unvoiced sound section and naturalness and clarity of a wideband speech may be superior than those of a narrowband speech. However, since voice communication applications, such as public switched telephone network (PSTN), an internet phone service such as VoIP and VoWiFi, and a voice-related application installed on a mobile device, are still provided based on narrowband speech codecs, significant time and cost are required for changing a current codec to a wideband codec.
Therefore, to obtain a wideband signal from a narrowband signal via a decoder, various bandwidth extension techniques have been suggested. An example of the bandwidth extension techniques may be a technique for allocating an additional bit for a high-band, that is, a guided bandwidth extension. The guided bandwidth extension is a technique for extending a speech bandwidth by using encoding information transmitted from an encoder, where additional information therefor is included in a bitstream. An encoder analyzes a speech signal and generates and transmits the additional information for a high-band signal. A decoder generates a high-band signal based on the transmitted additional information and a low-band signal. Another example of the bandwidth extension techniques may be a technique for generating a high-band signal from a low-band signal in a decoder without allocating an additional bit, e.g., a blind bandwidth extension. To this end, techniques based on estimations using pattern recognizing techniques, such as the hidden Markov model and the Gaussian mixture model, have been suggested. However, pattern recognition requires a training process, and efficiency of the pattern recognition may vary according to languages for recognition. Furthermore, since an amount of calculations for prediction or estimation significantly increases, it is difficult to quickly and effectively process a speech signal received in real time. In addition, the sound quality of a high-band signal generated without allocation of an additional bit is relatively inferior.
Recently, it becomes more and more necessary to provide a wideband signal or an ultra-wideband signal with improved sound quality to a user from a narrowband signal without an excessive increase of complexity and without changing the basic structure of an existing communication system, that is, the basic structure of a telephony system or a decoder used in a receiving end, even if a bandwidth extension technique is applied.