1. Field of the Invention
Methods, apparatuses, and systems consistent with the present invention relate to encoding and decoding a broadband voice signal, and more particularly, to encoding and decoding a broadband voice signal using a matching pursuit sinusoidal model to which a damping factor is added.
2. Description of the Related Art
The variety of application fields of voice communication and an increase in the data transmission rates of networks have resulted in an increase in the demand for high-quality voice communication. In order to meet the need for high-quality voice communication, a broadband voice signal having 50-7000 Hz bandwidth needs to be transmitted, which has superior performance in various aspects, such as naturalness and clarity, compared to an existing telephone band of 300-3400 Hz, and in order to effectively compress the broadband voice signal, the development of a new broadband voice compressor is desirable.
In particular, digital communication uses a packet switching method for integrating voice communication and data communication. However, the packet switching method may cause channel congestion, resulting in packet loss and inferior sound quality. Although a technique of hiding a damaged packet may be used in order to address these problems, this technique is not a long term solution to these problems. Thus, recent voice compressors have tried to address these problems by reducing traffic using an extension function.
The extension function allows optimal communication to be performed in a given channel environment by forming voice data in various stages and adjusting the amount of a stage transmitted according to a level of congestion when the voice data is packetized. The extension function is used for voice communication by means of a packet network and can provide optimal communication according to a network state. Moreover, if the extension function is provided when a voice packet is transmitted via channels having different bit rates, tandem-free communication, by which the voice packet is transmitted by adjusting a transmission stage without using double coding, can be performed.
Thus, research regarding voice encoding and decoding with the extension function has been conducted, and in more detail, a 16-bit linear Pulse Code Modulation (PCM) format voice signal is encoded and decoded using a sinusoidal synthesis model. A sinusoidal model is an efficient technique of encoding a voice signal at a low bit rate, and is recently being used for voice conversion, sound quality improvement, and low data rate audio coding. The sinusoidal model is used in the field of digital signal processing, where analysis and synthesis is performed on a video signal, a bio-signal, or the like, due to robustness to background noise and non-voice signals.
However, in a related art sinusoidal model used for modeling a voice signal, it is assumed that a sinusoidal parameter is constant in an integer multiple of a fundamental frequency in a single frame. Due to this assumption, when a voice signal having a time varying characteristic is synthesized by a decoder end, the time varying characteristic is distorted, and discontinuity between frames occurs. In order to address these problems, the decoder end uses a parameter interpolation method or a waveform interpolation method. However, the parameter interpolation method or the waveform interpolation method causes modification of a voice waveform, resulting in distortion of a waveform during a non-stationary period. In particular, a significant decrease in sound quality occurs due to distortion of a waveform in the voice signal in an onset or offset transition duration.
In addition, a related art harmonic coding method that has been used by voice encoders having a low transmission rate detects a harmonic magnitude using a peak detection method for making a zero phase and performing Fast Fourier Transformation (FFT) in order to prevent phase transmission. However, the related art harmonic coding method has the limitation that a frequency resolution of less than 512 points must be applied due to restrictions of complexity and on data rate. A decrease of the frequency resolution and a transmission restriction of a phase parameter obstruct correct harmonic peak detection, and as a result, the performance of a voice encoder decreases due to delays in pulse positions of a synthesized voice signal and phase differences between frames.