In the fields of digital wireless communications, packet communications typified by Internet communications, and speech storage and so forth, techniques for coding/decoding speech signals are indispensable in order to efficiently use the transmission channel capacity of radio signal and storage medium, and many speech coding/decoding schemes have been developed. Among the systems, the CELP speech coding/decoding scheme has been put into practical use as a mainstream technique.
A CELP type speech coding apparatus encodes input speech based on speech models stored beforehand. More specifically, the CELP speech coding apparatus divides a digitalized speech signal into frames of about 20 ms, performs linear prediction analysis of the speech signal on a frame-by-frame basis, obtains linear prediction coefficients and linear prediction residual vector, and encodes separately the linear prediction coefficients and linear prediction residual vector.
In order to execute low-bit rate communications, since the amount of speech models to be stored is limited, phonation speech models are chiefly stored in the conventional CELP type speech coding/decoding scheme.
In communication systems for transmitting packets such as Internet communications, packet losses occur depending on the state of the network, and it is preferable that speech and sound can be decoded from part of remaining coded information even when part of the coded information is lost. Similarly, in variable rate communication systems for varying the bit rate according to the communication capacity, when the communication capacity is decreased, it is desired that loads on the communication capacity can be reduced at ease by transmitting only part of the coded information. Thus, as a technique enabling decoding of speech and sound using all the coded information or part of the coded information, attention has recently been directed toward the scalable coding technique. Some scalable coding schemes are disclosed conventionally.
The scalable coding system is generally comprised of a base layer and enhancement layer, and the layers constitute a hierarchical structure with the base layer being the lowest layer. In each layer, a residual signal is coded that is a difference between an input signal and output signal in a lower layer. According to this constitution, it is possible to decode speech and/or sound signals using the coded information of all the layers or using only the coded information of a lower layer.
However, in the conventional scalable coding system, the CELP type speech coding/decoding system is used as the coding schemes for the base layer and enhancement layers, and considerable amounts are thereby required both in calculation and coded information.