In the field of digital wireless communication, packet communication typified by Internet communication, or speech storage, speech signal encoding and decoding techniques are essential for effective use of the capacity of transmission paths of radio wave and storage media, and many speech encoding/decoding schemes have so far been developed. Among these, a CELP speech encoding and decoding scheme is put in practical use as a mainstream scheme (for example, see non-patent document 1).
The speech encoding apparatus of the CELP scheme encodes input speech based on pre-stored speech models. Specifically, a digital speech signal is separated into frames of approximately 10-20 ms, linear prediction analysis of speech signals is performed per frame, linear prediction coefficients and linear prediction residual vectors are obtained, and the linear prediction coefficients and linear prediction residual vectors are encoded individually. To carry out low bit rate communication, the amount of speech models that can be stored is limited, and therefore speech models are mainly stored in conventional CELP type speech encoding and decoding schemes
In communication systems where packets are transmitted, such as Internet communication, packet loss may occur depending on the network state, and it is thus desirable that, even if part of encoded information is lost, speech and sound can be decoded using the remaining part of encoded information. Similarly, in variable rate communication systems in which a bit rate varies depending on communication capacity, it is desirable that, when the communication capacity decreases, the burden on communication capacities is easy to reduce by transmitting a part of encoded information. As a technique capable of decoding speech/sound using all or part of encoded information in this way, a scalable coding technique has lately attracted attention. Several scalable coding schemes have been conventionally disclosed (for example, see patent document 1).
A scalable encoding scheme generally consists of a base layer and a plurality of enhancement layers, and these layers form a hierarchical structure in which the base layer is the lowest layer. At each layer, encoding of a residual signal that is a difference between input signal and output signal of the lower layer is performed. This configuration enables speech and sound decoding using encoded information at all layers or only encoded information at lower layers.
In the communication system transmitting the packet, when the decoding apparatus side cannot receive encoded information due to packet loss or the like, deterioration of decoded speech signals can be prevented to some degree by performing loss compensation (concealing). A method of concealing frame elimination is prescribed as a part of a decoding algorithm in, for example, ITU-T recommendation G.729.
Generally, loss compensation (concealing) processing recovers the current frame based on encoded information contained in a previously received frame. Decoded speech signals of the lost frame are produced by, for example, using encoded information contained in the frame immediately preceding the lost frame as encoded information for the lost frame; and gradually attenuating the energy of decoded signals which are generated using encoded information contained in the immediately preceding frame.
Patent Document 1: Japanese Patent Application Laid-Open No. Hei 10-97295
Non-patent Document 1: M. R. Schroeder, B. S. Atal, “Code Excited Linear Prediction: High Quality Speech at Low Bit Rate”, IEEE proc., ICASSP'85 pp. 937-940