Scalable coding refers to hierarchically encoding a speech signal and includes features that the speech signal can be decoded from encoded data of the other layer even if encoded data (coding information) of a given class (layer) is lost. Scalable coding of hierarchically encoding a narrowband speech signal and wideband speech signal is referred to as “band scalable speech coding.”
Generally, in scalable speech coding, a narrowband signal is encoded in the most basic layer and wideband signals of one layer below or above are encoded as a target in proportion to an increase of layers. Then, in this description, the most basic coding/decoding processing layer is referred to as a “core layer” and a coding/decoding processing layer realizing higher quality and wider band compared to the core layer is referred to as an “enhancement layer.”
Moreover, speech codec used in scalable coding includes features that a part of encoded data of a layer can be decoded even if the data is lost, and is suitable for encoding for VoIP (Voice over IP) which exchanges a speech signal as data using a packet communication path such as an IP network.
However, in best effort type packet communication, a transmission band is not generally secured, a part of a packet is lost or delayed and a part of encoded data is likely to be defective. For example, when traffic of a communication path is saturated due to congestion, encoded data is lost on the transmission path due to packet discard. Due to such defect of encoded data, cases occur in a decoding apparatus where decoding cannot be carried out at all, only coding information of a core layer is received or information up to an enhancement layer is received. Furthermore, these cases occur one after another over time, and, for example, a case may occur where a frame receiving only coding information of the core layer and a frame receiving coding information up to the enhancement layer need to be alternately decoded by switching the frames periodically. In such a case, when layer switching occurs, the sound volume and the band spread become discontinuous and sound quality of a decoded signal is deteriorated.
For example, Non-Patent Document 1 discloses a technique of, upon frame lost, interpolating parameters required for combining a signal based on past information in frame loss interpolation processing by speech codec using single layer CELP. In this lost data interpolation technique, as for a gain in particular, a gain used for interpolation data is represented by using a monotonic decreasing function for a gain which is based on a normally received past frame. Further, in gain control from the time of frame loss to the time of encoded data reception, a decoded pitch gain is used as a pitch gain and a code gain having a smaller value is used as a code gain by comparing an interpolated code gain interpolated during loss period with a current decoded code gain.    Non-Patent Document 1: “AMR Speech Codec; Error Concealment of lost frames” TS26.091