(a) Field of the Invention
The present invention relates to an apparatus for coding variable bitrate wideband speech and audio signals, and a method thereof. More specifically, the present invention relates to an apparatus for coding variable bitrate wideband speech and audio signals, and a method thereof for dividing speech and audio signals and transmitting the signals with an efficient bitrate in variable bit rate wideband speech and audio coding.
(b) Description of the Related Art
First, a general speech coding technique is disclosed. Although a bandwidth of human speech frequency is 50˜7000 Hz, in the speech coding techniques, 300˜3400 Hz is legibly used as a speech bandwidth of human, and the speech signal is sampled at 8 kHz, in consideration of a guard band.
Waveform coding, sound source coding, and hybrid coding are known as methods for coding speech signals to digital signals. PCM(G.711), ADPCM(G.721), SB-ADPCM(G.722), LD-CELP(G.728), CS-ACELP(G.729), MP-MLQ(G.723.1) etc. are known as main techniques thereof.
The G.711 reference is a method of speech coding using a 64 kbps PCM technique, which is a method recommended by ITU-T in 1972. The PCM is a method sampling, quantizing, and coding analog speech signals to digital signals and transmitting the digital signals, and decoding the digital signals to analog speech signals. The PCM uses a nonlinear quantizing technique for compressing speech signals before quantization as well as for decompressing the speech signals after decoding.
Further, the G.721 reference is a method of coding and compressing speech using a 32 kbps ADPCM technique, which was recommended by ITU-T in 1984. The ADPCM is a method of quantizing the difference of input signals and estimated values obtained by using a large correlation of speech signals in time to reduce the transmission bitrate. The ADPCM provides almost the same quality of sound as the PCM by using an adaptation quantizer and an adaptation predictor.
Further, the G.722 reference is a method of coding a wideband speech signal whose bandwidth is ranging from 50 Hz to 7 kHz and achieves a high quality with a bitrate of below 64 kbps, which was recommended by ITU-T in 1986. The subband-ADPCM method used in G.722 separates speech signals into two bands: a low frequency band of 0˜4 kHz and a high frequency band of 4˜8 kHz, processes speech signals according to ADPCM, and multiplexes the signals to transmit the signals at 64 kbps. The subband-ADPCM is applied to a multimedia communication conference for supplementing a speech conference.
Further, the G.728 reference is a method of speech coding which can obtain better sound quality than the G.721, where speech is coded at 16 kbps for low speed mobile communication, and was recommended by ITU-T in 1992. The LD-CELP (Low Delay-Code Excited Linear Prediction) method transfers only 10 bits of which 5 samples of speech signals are regarded as 1 frame, and achieves high quality of sound treated with a vector unit in 2 ms coding delay.
Further, the G.729, CS-ACELP, reference is coded at 8 kbps and achieves better sound quality than the G.721. Here, CS-ACELP is an abbreviation for Conjugate Structure-Algebraic Code Excited Linear Prediction.
Further, the G 723.1 reference is coded at 6.3 kbps or 5.3 kbps but achieves almost equivalent for 6.3 kbps MP-MLQ (Multi Pulse Multi Level Quantization) or poorer speech quality for 5.3 kbps ACELP than the G.721. It was recommended by ITU-T in 1995 and has been used as a standard speech coder for multimedia communications services.
A detailed comparison for the above methods is shown in Table 1.
TABLE 1Method ofReferencecompressionSpeedMOSApplicationG.711PCM 64 kbps4.1Digital transferringbetween central officesG.721ADPCM 32 kbps3.85CODEC in home orenterpriseG.722SB-ADPCM 64 kbps(audioMultimedia speechsignal)conference, AMbroadcast gradedsound qualityG.728LD-CELP 16 kbps3.61Digital mobilecommunication, ISDN,FR network for speechG.729CS-ACELP  8 kbps3.92H.323, H.320, videoconference, terminalmobile communication,FR network for speechG.723.1MP-MLQ6.3 kbps3.9Mobile communication,ACELP5.3 kbps3.65H.324 etc., videoconference terminalmobile, VOIP form
FIG. 1a and FIG. 1b are diagrams for explaining division of speech signals into telephone speech, wideband speech, and wideband audio (or music). As shown in FIGS. 1a and 1b, narrowband speech of 300˜3,400 Hz may not express a significant high frequency component, wideband speech of 50˜7,000 Hz provides better sound quality than that of the narrowband, and wideband audio of 20˜20,000 Hz can provide music with the quality of CDs (Compact Discs) or DATs (Digital Audio Tapes).
FIG. 2 is a diagram for explaining types of general ITU-T wideband speech coders. The G.711 reference, G.723.1 reference, and G.729 reference etc. are applied to a narrowband speech CODEC, and the G.722, G.722.1 or G.722.2 reference are applied to a wideband speech CODEC as shown in FIG. 2.
Meanwhile, EP 1202252A2 applied by NEC Corporation of Feb. 5, 2002 discloses “Apparatus for bandwidth expansion of speech signals,” which relates to an apparatus for deciding a decoding method between narrowband speech signals and wideband speech signals based on coding parameters inputted to a CODEC, and coding the signals according to a result of the decision.
More specifically, the EP1202252A2 discloses a method dividing input signals into narrowband and wideband, and decoding the divided input signals suitably to their bandwidth in narrowband and wideband. If necessary, the invention decodes speech signals to wideband and improves quality of sound in a decoder. Here, the decision of bandwidth is made by using excited signals generated from LSPs (Line Spectral Pairs), an adaptive codebook, and a fixed codebook.
Meanwhile, Toshiyuki Nomura et al. reported a document “A bitrate and bandwidth scalable CELP coder” to the International Conference on Acoustics, Speech, and Signal Processing (Vol. 1, pp 341-344) in May 1998, which relates to an adaptable CELP-type speech CODEC allowing a bitrate and a bandwidth variable for a multimedia application, and discloses a method allowing a variable bitrate by using a coding method of a multilevel excited signal.
More specifically, according to the document, a variable bandwidth is achieved by coding a high frequency band parameter using CELP parameter information of a low frequency band, and the document provides a 16 kbit/s coder showing the same quality of sound as ITU-T 56 kbit/s G. 722 resulting from a Mean Opinion Sore (MOS) Test. According to this document, multilevel excited signals are coded by using a bitrate variable tool, low frequency band parameter information is used by a bandwidth variable tool, and a bitrate is adaptively controlled depending on circumstances of a communication network.
Meanwhile, for example, “Code-excited linear prediction: High quality speech at very low bit rates” (Proc. ICASSP, pp. 937-940, 1985) by M. Schroeder and B. Atal, and “Improved speech quality and efficient vector quantization in SELP” (Proc. ICASSP, pp. 155-158, 1988) by Kleijn et al. disclose CELP (Code Excited Linear Predictive Coding) which is known as a method for coding speech signals with high efficiency.
First, the CELP discloses extracting a spectrum parameter showing spectrum properties of speech signals per each frame of speech signals (for example, per 20 ms) by using a LPC (Linear Predictive Coding) analysis. Next, each frame is further divided into sub-frames (for example, 5 ms). The parameters for an adaptive codebook (delay parameter and gain parameter responding to pitch cycle) are extracted per sub-frame on the basis of past sound source signals for predicting speech signals of a sub-frame from the adaptive codebook over a long period.
Next, the most suitable sound source code vector is selected from a sound source codebook (a vector-quantizing codebook) constituted by the predetermined kinds of noise signals, the most suitable gain is calculated, and then the sound source signals obtained from the long period prediction are quantized. Further, with respect to the selection of the sound source code vector, the sound source code vector is selected to minimize an error power between signals composed of the selected noise signals and residual signals.
Then, an index showing types of the selected sound source code vector; a gain and a spectrum parameter; and a parameter of the adaptive code book are multiplexed by a multiplexer, and transferred.
Meanwhile, in the conventional method for coding speech signals as described above, for selecting the most suitable sound source code vector from the sound source codebook, it is needed to calculate a filtering or convolution operation for each code vector, and the operation needs to be performed repeatedly as many as the number of vector codes stored in the codebook, and therefore numerous operations are needed. For example, in case the number of the bit of a sound sourcebook is B bits, and the dimension of the code vector is N, assuming that a filter or response length is K at a filtering or convolution operation, N×K×2B×8000/N operations are needed. In the case B=10, N=40, K=10, a huge number of operations of 81,920,000 per second is needed.
Thus, various methods have been suggested for reducing the number of operations which are needed to search a sound source code vector from the sound source codebook. For example, the ACELP (Algebraic Code Excited Linear Prediction) method, which is one of them, is disclosed in a document entitled “16 kbps wideband speech coding technique based on algebraic CELP” (Proc. ICASSP, pp. 13-16, 1991) by C. Laflamme et al.
In the ACELP method, sound source signals are expressed as a plurality of pulses, and a location of each pulse is indicated with the predetermined number of bits and they are transferred. Since the amplitude of each pulse is limited to +1 or −1, the number of operations for searching the pulse can be significantly reduced.
However, in the conventional method for coding speech signals as described above, satisfactory quality of sound can be obtained from speech signals with a coding bitrate over 8 kbit/s. Meanwhile, when a coding bitrate becomes less than 8 kbit/s, the number of pulses per sub-frame is not sufficient, so it is difficult to express sound source signals with sufficient accuracy. Thus, there is a problem that loss of sound quality occurs with coded speech.
Most apparatuses for coding of variable bitrate wideband speech and audio use a variable bandwidth method, which modifies a bitrate in narrowband or wideband; or modifies only the bandwidth.
That is, in a speech CODEC according to the conventional method, modification of the bitrate is achieved by controlling bits assigned to the inside of the narrowband or the wideband according to parameters of each CODEC, in consideration of a channel state or control of the CODEC. Further, the bitrate can be modified by simply adjusting the bandwidth such as from narrowband to wideband or from wideband to narrow band.
Further, in the case input signals are audio signals having significant information in a high frequency band, and only a low frequency band or a narrow band is coded and transferred, the bitrate modification method can cause a problem by limitation of a low bitrate. That is, the bitrate modification method excludes audio signals including music signals or natural sounds etc. in coding, so as to cause loss of sound quality.