In mobile communication systems, for effective use of radio wave resources and the like, it is required to compress a speech signal at a low bit rate upon transmission. Meanwhile, since users have demanded improvements in quality of telephone speech and achievement of telephone service with a high fidelity, required is not only high quality of speech signals, but also high-quality coding of signals with a wider band such as audio signals and the like.
For two thus mutually contradictory requirements, a potential technique is to integrate a plurality of coding techniques hierarchically. This technique hierarchically combines a first layer for encoding an input signal at a low bit rate using a model suitable for speech signals, and a second layer for encoding a differential signal between the input signal and a decoded signal of the first layer using a model suitable for signals other than speech signals. Such a technique that performs layered coding has scalability for a bit stream obtained from a coding apparatus i.e. has a property of being able to obtain a decoded signal from information about part of a bit stream, and is generally called scalable coding. This scalable coding is capable of flexibly supporting communication between networks with different bit rates. Accordingly, scalable coding is regarded as being suitable for the future network environment where various networks will be integrated using the IP protocol. As an example for implementing scalable coding using techniques standardized by MPEG-4 (Moving Picture Experts Group phase-4), for example, there is a technique as disclosed in Non-patent Document 1. This technique uses CELP coding (Code Excited Liner Prediction) coding suitable for speech signals in the first layer, and in the second layer, uses transform coding such as AAC (Advanced Audio Coder), Twin VQ (Transform Domain Weighted Interleave Vector Quantization) and the like for a residual signal obtained by subtracting a first layer decoded signal from an original signal. This transform coding is a technique for transforming a signal in the time domain into a signal in the frequency domain and encoding the signal in the frequency domain.
Further, as a specific example of transform coding, there is a technique as disclosed in Patent Document 1. In this technique, an input signal is subjected to pitch analysis to obtain a pitch frequency, and spectra positioned at frequencies of integral multiples of the pitch frequency are collectively encoded. Herein, when it is assumed that a frequency of an integral multiple of the pitch frequency that is a parameter for specifying a harmonic structure of a speech signal is called a harmonic frequency, and that a spectrum positioned at the harmonic frequency is called a harmonic spectrum, the technique of Patent Document 1 is to decode a harmonic spectrum, subtract the decoded spectrum from an input spectrum to obtain an error spectrum, and separately encode the error spectrum. According to this configuration, it is possible to efficiently encode the harmonic spectrum with a relatively small amount of computations, and to provide a coding scheme with little degradation of speech quality.    Patent Document 1: Japanese Patent Application Laid-Open No. H09-181611    Non-patent Document 1: “All about MPEG-4”, written and edited by Sukeichi Miki, first print, Kogyo Cyosakai Publishing, Inc. Sep. 30, 1998, p 126-127