In communication systems handling digitized speech/sound signals, such as mobile communication or the Internet communication, speech/sound signal encoding/decoding techniques are essential for effective use of a communication line that is a limited resource, and many encoding/decoding schemes have so far been developed.
Among these, particularly a CELP encoding and decoding scheme is put in practical use as a mainstream scheme (see, for example, Non-Patent Document 1). The CELP scheme speech encoding apparatus encodes input speech based on a speech generation model. Specifically, a digital speech signal is separated into frames of approximately 20 ms, linear prediction analysis of the speech signals is performed per frame, and the obtained linear prediction coefficients and linear prediction residual vectors are encoded individually.
In communication systems where packets are transmitted, such as Internet communication, packet loss may occur depending on the network state, and a function is desired where speech and sound can be decoded using the remaining encoded information, even if part of encoded information is lost. Similarly, also in variable rate communication systems where a bit rate varies depending on line capacity, when the line capacity decreases, it is desirable to reduce the burden on communication system by transmitting a part of encoded information. As a technique of capable of decoding the original data using all or part of encoded information, a scalable encoding technique has lately attracted attention. Several scalable encoding schemes have been conventionally disclosed (see, for example, Patent Document 1).
A scalable encoding scheme generally consists of a base layer and a plurality of enhancement layers, and these layers form a hierarchical structure in which the base layer is the lowest layer. Encoding of each layer is performed by taking a residual signal, which is a signal representing a difference between an input signal of the lower layer and a decoded signal, as a target for encoding, and using encoded information at lower layers. This configuration enables the original data decoding using encoded information of all layers or only encoded information at lower layers.    Patent Document 1: Japanese Patent Application Laid-Open No. HEI10-97295    Non-Patent Document 1: Manfred R. Schroeder, Bishnu S. Atal, “CODE-EXCITED LINER PREDICTION (CELP): HIGH-QUALITY SPEECH AT VERY LOW BIT RAYES,” IEEE proc., ICASSP'85 pp.937-940