The present invention disclosed herein relates to a method and device for extending a bandwidth of a vocal signal, and more particularly, to a method and device for extending a bandwidth of a vocal signal for improving performance.
In most speech communication systems, the speech bandwidth is limited to a range of 0.3 kHz to 3.4 kHz. This speech bandwidth includes voiced sounds and unvoiced sounds. Since this speech bandwidth is low, the quality of original sounds is degraded. In order to overcome this limitation, a wideband speech receiver has been proposed. Wideband speech, of which bandwidth ranges from 50 Hz to 7 kHz, can represent all speech bands including voiced/unvoiced sounds and improve naturalness and clarity in comparison with narrowband speech. However, narrowband speech is currently popularly serviced with a narrowband speech codec in many applications such as voice communications over a public switched telephone network (PSTN), voice over IP (VoIP), and voice applications in smart phones. Therefore, it takes a lot of time and requires high cost to replace the narrowband speech codec with a wideband speech codec.
To overcome this limitation, it has been proposed to receive narrowband speech and convert the received speech into a wideband signal at a decoder. Accordingly, various methods for extending the speech bandwidth have been proposed.
One of the methods is allocating an additional bit for wideband. According to this method, side information is used. That is, by using encoding information transmitted from an encoder, high-band specch is generated. The encoder generates and transmits auxiliary information based on analysis of high frequency band information of an input signal. Here, the decoder generates a high frequency band signal based on transmitted auxiliary information. For instance, the wideband speech codec G.729.1 may provide coding with 12 different bit rates between 8 kbit/s and 32 kbit/s. The baseline coder of G.729.1 is fully compatible with G.729 that is a representative narrowband codec, thereby ensuring narrowband speech quality in 8 kbit/s mode. Here, the encoder generates wideband speech from the 14 kbit/s mode, of which operation mode is called ‘layer 3’, by using the above-described bandwidth extension technique. The encoder allocates additional bits for the bandwidth extension technique used in layer 3 of G.729.1 so that the high frequency band signal is generated during a decoding operation. However, this bandwidth extension technique requires additional bits, causing network overload. Moreover, this technique also requires modification of the encoder.
A method for generating a high frequency band signal from a low frequency band signal in a decoder without allocating additional bits has been proposed. For instance, for this method, estimation through a pattern recognition algorithm such as a hidden Markov model (HMM) and a Gaussian mixture model (GMM) has been proposed. However, the pattern recognition requires a training process, and performance may be variable according to language. Further, in the case where prediction or estimation is needed, additional bits are included and computational complexity is increased. Therefore, it is difficult to efficiently and rapidly process speech received in real time. In addition, various methods for extending bandwidth without allocating additional bits are limited in quality of output speech.