1. Field of the Invention
The present invention relates to a process of encoding a speech signal, and more particularly, to a method, apparatus, and medium for rapidly and reliably classifying an input speech signal when encoding the speech signal and a method, apparatus, and medium for encoding the speech signal using the same.
2. Description of the Related Art
A speech encoder converts a speech signal into a digital bit stream, which is transmitted over a communication channel or stored in a storage medium. The speech signal is sampled and quantized with 16 bits per sample and the speech encoder represents the digital samples with a smaller number of bits while maintaining good subjective speech quality. A speech decoder or synthesizer processes the transmitted or stored bit stream and converts it back to a sound signal.
In a wireless system using code division multiple access (CDMA) technology, the use of a source-controlled variable bit rate (VBR) speech encoder improves system capacity. In the source-controlled VBR encoder, a codec operates at several bit rates, and a rate selection module is used to set the bit rate used for encoding each speech frame based on the nature of the speech frame (e.g. voiced, unvoiced, transient, background noise). Furthermore, the aim of encoding with the source-controlled VBR encoder is to obtain optimum sound quality at a given average bit rate, that is, an average data rate (ADR). The codec may operate in different modes by adjusting the rate selection module such that different ADRs are obtained in different modes with improved codec performance. The operation mode is determined by the system according to a channel state. This allows the codec to make a trade-off between the speech quality and the system capacity.
As can be seen from the above description, the signal classification is very important for an efficient VBR encoder.
In a standard speech encoder using the CDMA technology, a voice activity detector (VAD) or a selected mode vocoder (SMV) is used as a speech classifying apparatus. The VAD detects only whether an input signal is speech or non-speech. The SMV determines a transmission rate in every frame in order to reduce bandwidth. The SMV has transmission rates of 8.55 kbps, 4.0 kbps, 2.0 kbps, and 0.8 kbps, and sets one of the transmission rates for a frame unit to encode a speech signal. In order to select one of the four transmission rates, the SMV classifies an input signal into six classes, that is, silence, noise, unvoiced, transient, non-stationary voiced, and stationary voiced.
However, a conventional SMV uses parameters of the codec on the input speech signal, such as calculation of a linear prediction coefficient (LPC), recognition weight filtering and detection of an open-loop pitch, in order to classify the speech signal. Accordingly, the speech classifying device depends on the codec.
Moreover, since the conventional speech classifying apparatus classifies the speech signal in a frequency domain using a spectral component, the process is complicated and it takes much time to classify the speech signal.