1. Field of the Invention
The present invention relates generally to a method and apparatus for expanding a bandwidth of narrowband voice signals, and more particularly, to a method and apparatus for generating expanded-band voice signals by reducing artifacts caused by the bandwidth expansion of the narrowband voice signals.
2. Description of the Related Art
Generally, a human being can hear and recognize a voice ranging over an audible frequency band of 20 hz-20 Khz. The voice is divided into consonants and vowels (voiceless sounds and voiced sounds) according to the lingual characteristic. It is known that the voice has a stationary characteristic for a short interval of 10-30 ms in which the physical characteristics of the vocal tract extending from the vocal cords to the lips, and/or the signal characteristic of the voice, are maintained intact.
The voice is converted into an electric voice signal, and then delivered to another party over a telephone or a mobile communication terminal in the form of an analog signal or a digital signal. In order to transmit/receive the voice signal using an electronic apparatus such as the telephone or the mobile communication terminal, a bandwidth of the transmission/reception voice signal is limited to 300 Hz-3.4 KHz of a minimum-narrowband voice signal that the human being can recognize, due to the capacity limitation of the transmission/reception data. A loss of the voice signal in a lower band (20 Hz-300 Hz) and in an upper band (3.4 KHz-20 KHz) causes degradation of voice signal quality.
Poles of a Linear Predictive Coefficient (LPC) filter for the voice signal, referred to a formant frequencies, represent resonant frequencies caused by the whole or a part of the human vocal tract. The formants are important information in identifying vowels, and are called a first formant, a second formant, a third formant, etc. from the lower frequency. In case of the major vowels, it is possible to identify a difference between the vowels only with the information on the first formant and the second formant. The vowel has more than four formants, and in some cases, more than six formants. However, consonants, such as a fricative sounds or a plosive sounds, only have one or two formant frequencies. This is due to the fact that while a resonant operation for the vowel occurs by the vocal tract, a resonant operation for the consonant mainly occurs in a short interval of the oral tract. The voice generated from a consonant also generally has a high-energy component in the high-frequency band of 3.4 KHz or higher.
In artificial bandwidth expansion, vowel-like signals are definite in their signal characteristics and have a relatively stationary characteristic over a long time interval compared to the consonant, making it is easy to model the vowel signals.
With respect to vowel signals, there is a low possibility that artifacts will occur in estimating information on the expanded band when attempting bandwidth expansions using only information on the narrowband voice signal More specifically, even though active bandwidth expansion is attempted, the occurrence possibility of artifacts is low. However, the consonant-like signals are indefinite in their signal characteristics, have a relatively high-energy component in the high-frequency band, and also have a dynamic characteristic, in that the consonant signals abruptly change with the passage of time. Therefore, it is difficult to model these signals, and there is a high possibility that an error will occur in estimating information on the expanded band when attempting bandwidth expansions using only information on the narrowband voice signal. If active bandwidth expansion is attempted, the occurrence possibility of artifacts increases.
FIG. 1 is a diagram illustrating a structure of a voice signal bandwidth expander.
Referring to FIG. 1, a narrowband voice signal input unit 100 extracts a narrowband LPC from a narrowband signal sampled at 8 KHz, and generates a narrowband excitation signal using the LPC. Next, a bandwidth expander 110 estimates an LPC and a gain of the upper band (for example, 4 KHz-8 KHz) from the narrowband LPC using a codebook mapping method that stores the previously calculated LPC and gain and uses them when necessary. The bandwidth expander 110 generates an excitation signal of the upper band from the narrowband excitation signal using an interpolation method that estimates a value between two particular values. The upper-band signal is synchronized using the generated upper-band LPC, upper-band gain, and upper-band excitation signal. Thereafter, the bandwidth expander 110 adds the synthesized upper-band signal to the original narrowband signal to finally synthesize a voice signal of a broadband (0 Hz-8 KHz), sampled at 16 KHz, thereby performing bandwidth expansion on the narrowband voice signal. Finally, an expanded-band voice signal output unit 120 outputs the expanded voice band.
FIG. 2 is a diagram illustrating a structure of a voice signal bandwidth expander for classifying signal types in a voice signal.
Referring to FIG. 2, a narrowband voice signal input unit 200 extracts a narrowband LPC from a narrowband signal sampled at 8 KHz, and generates a narrowband excitation signal using the narrowband LPC. A signal type classifier 210 classifies characteristics of the input narrowband signals according to their signal types, and for example, classifies the characteristics into the presence/absence and characteristics of background noises, a voiced sound and a voiceless sound, based on the previously input reference values. A type-based bandwidth expander 220 adjusts characteristics of the expanded-band signal expanded from the narrowband signal based on the classified types. An expanded-band voice signal output unit 230 outputs an expanded voice band which is matched to the signal characteristic of the narrowband input signal or the characteristic of the background noise.
FIG. 3 is a diagram illustrating a structure of a voice signal bandwidth expander using a coding bit rate of a voice signal.
Referring to FIG. 3, a coded narrowband voice signal input unit 300 receives a coded narrowband voice signal, and a coding bit rate detector 310 detects a bit rate when the coded narrowband voice signal is a signal coded at a particular bit rate which is a frame unit. An expanded-band energy controller 320 adjusts the characteristic of the entire energy or the partial interval's energy of the expanded band in the narrowband voice signal such that the energies are inversely proportional to the bit rate of the narrowband signal. A decoder 330 decodes the coded narrowband voice signal into the original narrowband voice signal. A bandwidth expander 340 actively performs band expansion on the narrowband signal coded at a high bit rate, which has relatively less coding noises, because the distortion and sound quality reduction possibility of the expanded band because the band expansion is relatively low. However, the bandwidth expander 340 passively performs band expansion on the narrowband signal coded at a low bit rate, which has relatively many coding noises, because the distortion and sound quality reduction possibility of the expanded band due to the band expansion is relatively high.
The bandwidth expander 340 adjusts the entire energy or the partial interval's energy of the expanded band such that the energies are inversely proportional to the bit rate of the narrowband signal, thereby reducing the distortion and sound quality reduction in the expanded band, which may be caused by the coding noises.
An expanded-band voice signal output unit 350 outputs a voice signal that has undergone bandwidth expansion based on the coding noises.
However, in artificial bandwidth expansion of the bandwidth-limited voice signal, even though the above-stated advanced technologies are used, the synthesized expanded-band signal is significantly lower in the sound quality than the original natural sound. In particular, the sound quality deteriorates due to the strength of artifacts generated by the artificial bandwidth expansion.