1. Field of the Invention
The present invention relates to a baseband modem and method for speech recognition, and more particularly, to a baseband modem and method for speech recognition and a mobile communication terminal using the baseband modem and method. Although the present invention is suitable for a wide scope of applications, it is particularly suitable for securing a higher rate of speech recognition.
2. Description of the Related Art
Generally, a conventional baseband modem includes an audio codec. Conventional speech recognition technology, as applied to a mobile communication terminal, generally utilizes the same sampling rate for both vocoding of voice communication and voice recognition. The same sampling rate is utilized because there are few baseband modems capable of supporting an input of a 16 kHz microphone and most baseband modems have difficulty obtaining PCM (pulse code modulation) data.
FIG. 1 is a block diagram illustrating a conventional baseband modem. FIG. 2 is a flowchart illustrating a conventional speech recognition method utilizing the baseband modem illustrated in FIG. 1.
Referring to FIG. 1, a conventional baseband modem includes an audio codec 13, a vocoder 15 and a processor 17. Once a voice signal is received from a microphone, the audio codec 13 performs modulation on the voice signal at a prescribed sampling rate. For example, PCM (pulse code modulation) is performed on the voice signal at a sampling rate of 8 kHz.
The vocoder 15 performs vocoding on an output of the audio codec 13. For instance, QCELP (Qualcomm Code Excited Linear Prediction) or EVRC (Enhanced Variable Rate Coding) is performed.
The processor 17 performs speech recognition on an output of the vocoder 15. Specifically, the processor 17 decodes vocoded data and then extracts a feature vector from the decoded data. The processor 17 performs speech recognition by applying the extracted feature vector to a speech recognition algorithm that was previously prepared. Preferably, the processor 17 includes an MPU (micro processing unit) or DSP (digital signaling processor). On the other hand, if the voice signal is for voice communication, the processor 17 performs channel encoding, using either a convolution code or turbo code, on the output of the vocoder 15.
A conventional speech recognition method according to the above-explained configuration is explained with reference to FIG. 2.
Once a voice signal is received from a microphone, the conventional baseband modem performs modulation on the voice signal at a prescribed sampling rate (S12). For example, PCM (pulse code modulation) is carried out on the inputted voice signal at a sampling rate of 8 kHz.
Vocoding of the modulated voice signal is then performed (S14). For example, QCELP (Qualcomm Code Excited Linear Prediction) or EVRC (Enhanced Variable Rate Coding) is utilized for vocoding.
Speech recognition is performed on the vocoded signal in an MPU (micro processing unit) or DSP (digital signaling processor). For speech recognition, vocoded data is decoded (S16) and a feature vector is extracted from the decoded data (S18). The extracted feature vector is then applied to a speech recognition algorithm (S20).
In the conventional method, the sampling rate for modulation is set to 8 kHz. This is because a speech level of a quality that is recognizable can be provided using a voice component below 4 kHz.
However, when performing speech recognition in a mobile communication terminal according to the conventional method, data processed according to sampling for voice communication is used. Therefore, the conventional method is unable to guarantee a satisfactory speech recognition rate. Furthermore, in the conventional method, unnecessary vocoding and decoding are performed as illustrated in FIG. 2.
Optionally, a digital signal processing chip or a speech recognition chip for speech recognition may be included in the mobile communication terminal. However, this increases the cost of a terminal.
In some conventional baseband modems, a method such as DTW (dynamic time warping) has been used for speech recognition. Since the data is processed according to sampling for voice communication, this method fails to guarantee a satisfactory speech recognition rate. In the conventional speech recognition method, either the sampling rate of the audio codec provided in the baseband modem is increased or extracting of the feature vector is not implemented with hardware.
There is another conventional method for speech recognition. In this method, a separate audio codec having a sampling rate suitable for speech recognition is installed outside the baseband modem. However, the corresponding hardware implementation is very complicated.
Conventional mobile communication terminals that perform speech recognition are unable to adjust the sampling rate of the baseband modem by separating voice communication from speech recognition. Furthermore, conventional baseband modems have difficulty obtaining the PCM (pulse code modulation) data.
Therefore, there is a need for an apparatus and method that can perform speech recognition and voice communication such that an optimized sampling rate is utilized for speech recognition to guarantee a satisfactory speech recognition rate without performing unnecessary vocoding and decoding. The present invention addresses these and other needs.