In speech data communication on IP network, to realize network traffic control and multicast communication on network, speech encoding employing a scalable configuration is anticipated. A scalable configuration is a configuration that enables the receiving side to decode speech data even from partial encoded data.
In scalable encoding, the transmitting side encodes an input speech signal in a layered manner, and transmits encoded data formed with a plurality of layers from lower layers including the core layer to higher layers including the enhancement layer. The receiving side can decode a signal using encoded data from lower layers to an arbitrary layer (for example, see Non-Patent Document 1).
By reducing the loss rate of encoded data in lower layers including the core layer rather than encoded data in higher layers to control packet loss on the IP network, it is possible to improve robustness against packet loss.
If loss of encoded data in lower layers including the core layer cannot be avoided, it is possible to perform error compensation using encoded data received in the past (for example, see Non-Patent Document 2). That is, if encoded data in lower layers including the core layer in layered encoded data obtained by performing scalable encoding processing on an input speech signal in frame units, is lost and cannot be received due to packet loss, the receiving side can perform error compensation using encoded data of a frame received in the past and can perform decoding. Therefore, it is possible to suppress quality degradation of a decoded signal to some extent when a packet loss occurs.
Non-Patent Document 1: ISO/IEC 14496-3:2001(E) Prt-3 Audio (MPEG-4) Subpart-3 Speech Coding (CELP)
Non-Patent Document 2: ISO/IEC 14496-3:2001(E) Prt-3 Audio (MPEG-4) Subpart-1 Main Annex1.B (Informative) Error Protection tool