Nowadays, there are many different sampling rates such as 44.1 kHz for a compact disk, 32 kHz or 48 kHz for DAT (Digital Audio Tape), digital VCR or satellite television, 48 kHz or 96 kHz for a DVD audio signal. Therefore, when an internal sampling rate of a decoder of a reproduction apparatus or a recording apparatus is different from the sampling rate of data to be decoded, it is necessary to change the sampling rate. One such conventional apparatus that converts this sampling rate is described, for example, in Patent Document 1.
Also, in recent years, transmission path capacities on a network have been significantly improved with the popularity of ADSL (Asymmetric Digital Subscriber Line) and optical fibers in a wired system, practical use of W-CDMA (Wideband-Code Division Multiple Access) and wireless LAN in a wireless system or the like, and in line with this trend, there are demands for realization of high sense of realism and high quality by expanding bandwidth of signal in voice communications.
At present, there are G.726, 729 or the like which are standardized by ITU (International Telecommunication Union) as typical schemes for coding a narrow band signal. Furthermore, examples of typical methods for coding a wideband signal include G722, G722.1 of ITU-T (International Telecommunication Union Telecommunication Standardization Sector) and AMR-WB or the like of 3GPP (The 3 rd Generation Partnership Project).
Moreover, with the intention of being used in various network environments such as an IP (Internet Protocol) network, the voice coding scheme is recently required to realize a scalable function. The scalable function means the function capable of decoding a voice signal even from part of a code. With this scalable function, it is possible to reduce the occurrence frequency of packet loss by decoding a high quality voice signal using all codes in a communication path under good conditions and transmitting only part of the code in a communication path under bad conditions.
It is also possible to produce effects such as an increase in efficiency of network resources in multicast communication.
To realize a high quality coding scheme having this scalable function, coding must be performed using signals at various sampling rates. For example, if a signal having a sampling rate of 8 kHz is coded using a method such as G.726, G.729 or the like standardized in ITU-T and its error signal is further coded in an area of sampling rate of 16 kHz, it is possible to improve quality through an extension of the signal bandwidth and realize scalability.
FIG. 1 is a block diagram showing the typical configuration of a coding apparatus that performs scalable coding. In this example, the number of layers is N=3 and the sampling rate of a signal layer n is represented FS(n) and suppose FS(1)=16 [kHz], FS(2)=24 [kHz] and FS(3)=32 [kHz].
An acoustic signal (voice signal, audio signal or the like) input to downsampling section 12 through input terminal 11 is downsampled from a sampling frequency of 32 kHz to 16 kHz and given to first layer coding section 13. First layer coding section 13 determines a first code so that perceptual distortion between the input acoustic signal and the decoded signal which is generated after the coding becomes a minimum. This first code is sent to multiplexing section 26 and also sent to first layer decoding section 14. First layer decoding section 14 generates a first layer decoded signal using the first code. Upsampling section 15 performs upsampling on the sampling frequency of the first layer decoded signal from 16 kHz to 24 kHz and gives the upsampled signal to subtractor 18 and adder 21.
Furthermore, an acoustic signal input to downsampling section 16 through input terminal 11 is downsampled from a sampling frequency of 32 kHz to 24 kHz and given to delay section 17. Delay section 17 delays the downsampled signal by a predetermined duration. Subtractor 18 calculates the difference between the output signal of delay section 17 and the output signal of upsampling section 15, generates a second layer residual signal and gives it to second layer coding section 19. Second layer coding section 19 performs coding so that the perceptual quality of the second layer residual signal is improved, determines a second code and gives this second code to multiplexing section 26 and second layer decoding section 20. Second layer decoding section 20 performs decoding processing using the second code and generates a second layer decoded residual signal. Adder 21 calculates the sum between above described first layer decoded signal and the second layer decoded residual signal and generates a second layer decoded signal. Upsampling section 22 performs upsampling on the sampling frequency of the second layer decoded signal from 24 kHz to 32 kHz and gives this signal to subtractor 24.
Moreover, an acoustic signal input to delay section 23 through input terminal 11 is delayed by a predetermined duration and given to subtractor 24. Subtractor 24 calculates the difference between the output signal of delay section 23 and the output signal of upsampling section 22 and generates a third layer residual signal. This third layer residual signal is given to third layer coding section 25. Third layer coding section 25 performs coding on the third layer residual signal so that its perceptual quality is improved, determines a third code and gives the code to multiplexing section 26. Multiplexing section 26 multiplexes the codes obtained from first layer coding section 13, second layer coding section 19 and third layer coding section 25 and outputs the multiplexing result through output terminal 27.
Patent Document 1: Unexamined Japanese Patent Publication No. 2000-68948