1. Field of the Invention
The present invention relates to a voice modulation apparatus and method in a voice telecommunication apparatus such as wired/wireless telephone.
2. Discussion of the Background Art
In general, a telephone is an instrument for voice telecommunication between two parties at a distance by wire or wirelessly, and the most basic form of communication in a modern society.
In recent years, with the development of mobile communication network technology, the popularity of wireless telephones, namely mobile communication terminals, has rapidly increased.
The mobile communication terminal has increased its role from voice transmission to data transmission/receiving, exchanging character (text) messages, providing services like weather forecast, stock transactions, money deposit or withdrawal, breaking news, and e-mail remote meter reading.
Besides the character (text) message service, multimedia message service (MMS) is now available through the mobile communication terminal also.
The multimedia messages include still images, voice messages, voice mails, and moving images using MPEG4.
Therefore, many application technologies for the mobile communication terminal that supports the multimedia message service are being developed in a steady stream. For example, in case of sending a still image, a user can add different effects to the still image by making the image blank and white or by inverting the image.
However, there are not many application programs developed for the voice messages except a voice mailbox, and the above special effects are hardly used.
When a caller wants to send a voice message or a voice mail to the other party, a vocoder converts the voice to appropriate digital signals for transmission.
Typically used voice coders for the telephone are AMR (Adaptive Multi Rate), EVRC (Enhanced Variable Rate Coder), QCELP (Qualcomm Code Excited Linear Predictive Coding) and so forth. On the whole, the voice coders can be divided into three types: source coder using a voice model, waveform coder, and hybrid coder, which is a combination of the source coder and the waveform coder.
The source coder analyzes a voice (or speech) model instead of a waveform of the voice, and modulates the analyzed data.
The source coder includes a LPC source vocoder, a channel source vocoder, a format source vocoder, a phase source vocoder etc.
The source coder extracts a characteristic parameter from a voice signal based on the generation model of a voice signal, and a decoder regenerates the voice using the characteristic parameter.
In other words, the source coder presents voice signals by modeling a human voice generation process. It does not regenerate a waveform of the voice signal, but regenerates sound that is as close as an original voice signal possible to a human's ear.
The source coder utilizes a voice coder with a low transmission rate usually around 4.8-13.2 Kbps.
A typically used voice coder is a LPC (Linear Predictive Coding).
On the other hand, the waveform coder, like PCM, modulates a voice waveform. Its primary objective is to ensure that a restored signal at a data sink conserves the pattern of an original signal from a data source.
Accordingly, the waveform coder is applicable not only to voice signals, but also to other size-limited signals (e.g., PSK (Phase Shift Keying) signals used in PC communication).
For the same reason, a waveform coder usually operates in a single sampling unit, and an objective scale like SNR can measure function of the waveform coder.
Examples of the waveform coder include PCM (Pulse Code Modulation), DM (Delta Modulation), APCM (Adaptive PCM), DPCM Difference PCM), ADPCM (Adaptive Difference PCM) and so on.
The first commercially used voice coder was 64 Kbps PCM that was accepted as an international standard back in 1972. This coder is still widely used in many digital systems especially telephones in general. Twelve years later, in 1984, 32 Kbps ADPCM replaced the 64 Kbps PCM. Compared to the 64 Kbps PCM, the 32 Kbps ADPCM has a lower transmission rate, and thus it is often used as criteria for voice quality of a low transmission rate-coder.
A problem with the waveform coder is that voice quality is severely degraded below 16 Kbps. However, since the waveform can be simply realized relatively and was performed with little computation, the waveform coder still has applications in many diverse fields.
Lastly, the hybrid coder, which has only advantages of the waveform coder and the source coder, codes a difference between an original sound and a restored sound.
The hybrid coder converts a voice signal to a digital PCM, and a vocoder extracts only characteristics of the voice with 64 Kbps PCM.
Therefore, the hybrid coder can maintain superior voice quality even at a low transmission rate around 8 Kbps.
In accordance with modeling of an error signal, the hybrid coder can be divided into RELP (Residual Excited Linear Prediction), MPLPC (Multi-Pulse LPC), CELP (Code Excited Linear Prediction), VSELP (Vector Sum Excited Linear Prediction), RPE-LTP (Regular Pulse Excited-Long Term Prediction), and IMBE (Improved Multi-Band Excitation).
The hybrid coder codes an error signal between the original sound and the restored signal and transmits the coded signal. To this end, vector quantization is employed.
The vector quantization process finds the codebook index which has minimum mean square error between the original signal and reconstructed signal, and transmits an index in order to get a compression effect therefrom.
FIG. 1 is a block diagram of a related general voice codec and transmission system.
Generally, voice is largely divided into voiced sounds and unvoiced sounds, depending on whether or not vocal cords vibrate.
The voiced sounds are generated when airflow with a period set by vibration of the vocal cords passes a vocal track that oscillates between glottis and lips. The unvoiced sounds are generated by forming a construction at some point along the vocal tract and forcing air through the constriction to produce turbulence, in the absence of vibration of the vocal cords.
When a person speaks, the physical shape of the vocal track changes by time. Thus, voice signals are nonstationary.
An example of voice generation model utilizes a time-varying digital filter to show characteristics of the vocal track, and depending on whether sound is voiced or unvoiced, excites an input signal to a periodic impulse train or white noise element.
Referring to FIG. 1, the voice transmission system in which a user transmits his or her voice to the other party using a voice communication apparatus includes an LPC (Linear Predictive Coding) analysis 100 to which a voice signal illustrated in FIG. 3 is input, a pitch detector 110, a coder 120, a decoder 130, and an LPC synthesizer 140.
To decode the voice signal, the voice transmission system represents the voice signal in terms of pitch and envelope before transmission.
The LPC analyzer 100 to which the voice signal is input obtains a filter factor that features envelope characteristics of voice spectrum.
The pitch detector 110 distinguishes whether the voice signal is voiced or unvoiced, and when the voice signal is voiced, the pitch is selected as an input signal but when the voice signal is unvoiced, the white noise is selected as an input signal.
The coder 120 codes the voice signal, based on the filter factor and the variable obtained from the LPC analyzer 100 and the pitch detector 110, and transmits the signal to the other party through a channel via a wire or wirelessly.
The decoder 130 demuxes a transmitted stream through the channel, and decodes the stream.
The LPC synthesizer 140 synthesizes the decoded voice stream to voice, and outputs the synthesized voice.
The related art voice coder with the above organization simply serves to amplify an analog voice signal, or to convert the analog voice signal to a digital signal, and enables to exchange the signal through interface via a wire or wirelessly. Its primary role is found in minimizing sound distortion and noises, and thus restoring an original sound as much as possible.
However, considering that most of people now use telephones very often, simply exchanging one's voice is not sufficient to meet diverse demands of users.
For example, as the image of a current society to women is very dangerous and insecure, they often want to answer a phone in a male voice especially when they are home alone.
Also, there are people who want to create voice messages or voice mails using a different voice from theirs, hoping their callers to enjoy the messages.