In general, an audio signal includes a signal of various frequency bands. A human audible frequency is in the range of 20 Hz to 20 kHz, whereas a common human voice is in a frequency band equal to or less than about 4 kHz.
There may be a case where an input audio signal includes a component of a high frequency band greater than or equal to 7 kHz in which a human voice is difficult to exist, as well as a band in which a human voice exists.
As such, if a coding scheme suitable for a narrowband (e.g., ˜4 kHz) is applied to a signal of a wideband (about ˜8 kHz) or a super wideband (about ˜16 kHz), there is a problem in that sound quality deteriorates due to a band which is not encoded.
Recently, with an increased demand on video telephony, video conference, etc., there is a growing interest on an encoding/decoding technique by which an audio signal, that is, a speech signal, is restored to be close to an actual voice. More specifically, there is a growing interest on an encoding/decoding technique by which an encoding band is extended, and also, in a network for transmitting voice information, an interest region is moved from a circuit switching network scheme to a packet switching network scheme.
In this case, a delay may occur in a process of transmitting an audio signal which is packetized by the use of encoding due to a problem on a network. The delay occurring in the transmission process results in an output delay or a sound quality deterioration in an output end.
Accordingly, there is a need to consider a method to be used in a receiving end to solve an audio signal delay or loss problem occurring in the transmission process.