In the transmission of audio and acoustic signals (which are collectively referred to hereinafter as “audio signal”) through an IP network or a mobile communication network, the audio signal is encoded into audio packets at regular time intervals and transmitted through a communication network. At the receiving end, the audio packets are received through the communication network and decoded into a decoded audio signal by server, a MCU (Multipoint Control Unit), a terminal or the like.
The audio signal is generally collected in digital format. Specifically, it is measured and accumulated as a sequence of numerals whose number is the same as a sampling frequency per second. Each element of the sequence is called a “sample”. In audio encoding, each time a predetermined number of samples of an audio signal is accumulated in a built-in buffer, the audio signal in the buffer is encoded. The above-described specified number of samples is called a “frame length”, and a set of the same number of samples as the frame length is called “frame”. For example, at the sampling frequency of 32 kHz, when the frame length is 20 ms, the frame length is 640 samples. Note that the length of the buffer may be more than one frame.
When transmitting audio packets through a communication network, a phenomenon (so-called “packet loss”) can occur where some of the audio packets are lost, or an error can occur in part of information written in the audio packets due to congestion in the communication network or the like. In such a case, the audio packets cannot be correctly decoded at the receiving end, and therefore a desired decoded audio signal cannot be obtained. Further, the decoded audio signal corresponding to the audio packet where packet loss has occurred is detected as noise, which significantly degrades the subjective quality to a person who listens to the audio.