In recent years, owing to the spread of the third generation mobile telephone, personal speech communication has entered a new era. In addition, services for sending speech using packet communication, such as that of the IP telephone, have expanded, with a fourth generation mobile telephone that is expected to be in service in 2010 headed toward telephone connection using total IP packet communication. This service is designed to provide seamless communication between different types of networks, requiring speech codec that supports various transmission capacities. Multiple compression rate codec, such as the ETSI-standard AMR, is available, but requires speech communication not susceptible to sound quality deterioration by transcodec during communication between different networks where a reduction in transmission capacity during transmission is often desired. Here, in recent years, scalable codec has been the subject of research and development at manufacturer locations and carrier and other research institutes around the world, becoming an issue even in ITU-T standardization (ITU-T SG16, WP3, Q.9 “EV” and Q.10 “G.729EV”).
Scalable codec is a codec that first codes data using a core coder and next finds in an enhancement coder an enhancement code that, when added to the required code in the core coder, further improves sound quality, thereby increasing the bit rate as this process is repeated in a step-wise fashion. For example, given three coders (4 kbps core coder, 3 kbps enhancement coder 1, 2.5 kbps enhancement coder 2), speech of the three bit rates 4 kbps, 7 kbps, and 9.5 kbps can be output.
In scalable codec, the bit rate can be changed during transmission, enabling speech output after decoding only the 4 kbps code of the core coder or only the 7 kbps code of the core coder and enhancement coder 1 during 9.5 kbps transmission using the above-mentioned three coders. Thus, scalable codec enables communication between different networks without transcodec mediation.
The basic structure of scalable codec is a multistage or component type structure. The multistage structure, which enables identification of coding distortion in each coder, is possibly more effective than the component structure and has the potential to become mainstream in the future.
In Non-patent Document 1, a two-layer scalable codec employing ITU-G standard G.729 as the core coder and the algorithm thereof are disclosed. Non-patent Document 1 describes how to utilize the code of a core coder in an enhancement coder for component type scalable codec. In particular, the document describes the effectiveness of the performance of the pitch auxiliary. Non-Patent Document 1: Akitoshi Kataoka and Shinji Mori, “Scalable Broadband Speech Coding Using G.729 as Structure Member,” IEICE Transactions D-II, Vol. J86-D-11, No. 3, pp. 379 to 387 (March 2003)