1. Field of the Invention
The present invention relates to speech signal encoding and decoding, and more particularly, to speech compression and decompression apparatuses and methods, by which a speech signal is compressed into a scalable bandwidth structure and the compressed speech signal is decompressed into the original speech signal.
2. Description of the Related Art
With the development of communication technology, speech quality has emerged as a significant competitive factor among communication companies.
Existing public switched telephone network (PSTN)-based communication samples a speech signal at 8 kHz and transmits a speech signal with a bandwidth of 4 kHz. Thus, the existing PSTN-based communication cannot transmit a speech signal that falls outside the 4 kHz bandwidth, resulting in degradation of speech quality.
To solve such a problem, a packet-based wideband speech encoder that samples an input speech signal at 16 kHz and provides a bandwidth of 8 kHz has been developed. When the bandwidth of a speech signal increases, speech quality is improved, but data transmitted over a communication channel increases. Thus, to use the wideband speech encoder efficiently, a wideband communication channel must be secured at all times.
However, the amount of data transmitted over a packet-based communication channel is not fixed, but varies due to a variety of factors. As a result, the wideband communication channel necessary for the wideband speech encoder may not be secured, resulting in degradation of the speech quality. This is because, if the required bandwidth is not provided at a specific moment, transmitted speech packets are lost and the speech quality is sharply degraded.
Hence, a technique of encoding a speech signal into a scalable bandwidth structure has been suggested. The International Telecommunication Union (hereinafter, referred to as “ITU”) standard G.722 suggests such an encoding technique. The ITU G.722 standard has proposed dividing an input speech signal into two bands using low pass filtering and high pass filtering, and encoding each of the bands separately. In the ITU G.722 standard, each band of information is encoded using adaptive differential pulse code modulation (ADPCM). However, the encoding technique proposed in the ITU G.722 standard has the disadvantage that it is incompatible with existing standard narrowband compressors and has a high transmission rate.
Another approach to encoding the speech is to transform a wideband input signal into a frequency domain, divide the frequency domain into several sub-bands, and compress information of each of the sub-bands. The ITU G.722.1 standard suggests such an encoding technique. However, the ITU G.722.1 standard has the disadvantage that it does not encode a speech packet into the scalable bandwidth structure and is incompatible with the existing standard narrowband compressor.
The existing speech encoding techniques that have been developed in consideration of compatibility with the existing standard narrowband compressor obtain a narrowband signal by performing low pass filtering on a wideband input signal and encode the obtained narrowband signal using the existing standard narrowband compressor. A high-band signal is processed using another technique. Packets are transmitted separately for a high-band and a low-band.
An existing technique for processing the high-band signal includes a method of splitting the high-band signal into a plurality of subbands using a filter bank and compressing information regarding each subband. Another technique for processing the high-band signal includes transforming the high-band signal into the frequency domain by discrete cosine transform (DCT) or discrete Fourier transform (DFT) and quantizing each frequency coefficient.
However, since theses speech encoding techniques just divide an input signal into two bands and process each band separately, a high-band signal processing unit cannot additionally process distortion caused by the narrowband speech compressor.
Also, when the high-band signal is compressed, acoustic characteristics of a speech signal are not used efficiently, resulting in a decrease in quantization efficiency. When the plurality of subbands signal obtained by the filter bank is quantized, a correlation between bands is not utilized properly.