1. Field of the Invention
The present invention relates to a telephone system and a voice encoding/decoding method, and is applicable to a mobile terminal device of the cellular system referred to as the PCS (Personal Communication Service) system in the United States of America, for example.
2. Description of the Related Art
In the United States, the cellular system referred to as the PCS system applying the CDMA (Code Division Multiple Access) method defined in the EIA/TIA (Electronics Machinery Industries Association/American Telecommunication Industries Association) IS-95A and the ANSI (American National Standards Institute, Inc.) J-STD008, has been proposed.
The PCS system is mainly composed of a mobile terminal device and a plurality of base station devices, and these communicate to each other in the CDMA method by providing a radio circuit in the between the mobile terminal device and the base station device.
Applying the CDMA method as the communication method, the PCS system obtains the merit that the system capacity can be increased to approximately ten times, compared with the case where the conventional analog method has been applied.
The mobile terminal device in this PCS system generally has the echo-canceling function and the side-back-toning function. Here the echo-canceling function is a function to eliminate a short delayed disturbance in a voice signal of 2 [ms] which is entered to a microphone by propagating in the box part of device itself from the speaker of device.
And the side-back-toning function is a function to supply the user's voice entered from the microphone together with the voice of the other party received by the speaker so that the user can hear his own voice through the speaker when talking.
FIG. 1 shows the construction of mobile terminal device having the echo-canceling function and the side back toning function. The mobile terminal device 1 is composed of the receiving system (the lower part of FIG. 1) and the transmitting system (the upper part of FIG. 1).
In the receiving system, a receiving signal S1 received by an antenna 2 is sent to an RF amplifier 4 via a multiplexer 3. The RF amplifier 4 amplifies the receiving signal S1 to the predetermined power level and sends the amplified receiving signal S1 to a demodulator 5. The demodulator 5 extracts voice packet data S2 from the receiving signal S1 and sends the voice packet data S2 to a voice decoder 6.
The voice decoder 6 decodes the voice packet data S2 and once writes to a buffer 7 the voice data S3. The buffer 7 transfers one frame of the output voice frame in the voice data S3 to an output buffer 8 as voice sample data S4. The output buffer 8 has the capacity enough to store one output voice frame.
Then a digital-to-analog converter 9 converts the voice sample data S4 from the output buffer 8 to an analog signal S5 and sends the analog signal S5 as voice via a speaker 10.
In the transmitting system, the analog signal S6 such as the voice obtained by a microphone 11 is sent to an analog-to-digital converter 12. The analog-to-digital converter 12 converts the analog signal S6 to the digital data and writes to an input buffer 13 the voice sample data S7. At this time, the voice sample data S4 (the output voice frame for one frame) will be transferred, and written to the output buffer 8 from the buffer 7 at the same timing that the voice sample data S7 for one frame has written to the input buffer 13 as the input voice frame.
Considering that the echo-canceling and the voice-decoding requires time, the voice sample data S4 (the output voice frame) is transferred to the output buffer 8 one frame by one frame after the buffer 7 writes the voice data S3 once, at the timing that the voice sample data S7 has written to the input buffer 13 as one input voice frame.
FIG. 2 shows the relation for each frame of the voice sample data S4 and S7 written to the output buffer 8 and the input buffer 13 at the same timing respectively. As shown in FIG. 2, the voice sample data S7 written to the input buffer 13 and the voice sample data S4 written to the output buffer 8 have the same boundary between frames.
Therefore, in the mobile terminal device 1, when the echo-canceling is performed, an echo canceler 14 can eliminate a delayed disturbance in a voice signal from an input voice frame 1 based on the output voice frame 1 which has the same boundary as the input voice frame 1, and newly rewrite the voice sample data S7.
FIG. 3 shows the operational timing of the echo-canceling. Since the output voice frame has been written to the output buffer 8 at the same timing that the input voice frame has been written to the input buffer 13 as the operational timing of the DSP unit in the mobile terminal device 1, decoding of the voice packet data S2 is started by the voice decoder 6 at the timing "A" in FIG. 3 and the output voice frame for 1 frame is transferred from the buffer 7 to the output buffer 8.
At this time, the input voice frame for one frame has been written in the input buffer 13 and the echo-canceling is performed at the timing "B" in FIG. 3 based on the output voice frame written to the output buffer 8 and, thereafter, the input voice frame that is echo-canceled is encoded at the timing "C" in FIG. 3.
Specifically, the echo canceler 14 forecasts the delayed disturbance in the voice signal based on the voice sample data S4 of one output voice frame written in the output buffer 8 to eliminate the delayed disturbance in the voice signal from the voice sample data S7, and newly once writes to the input buffer 13 the voice sample data S8 and outputs the voices sample data S8 to a voice encoder 15.
The voice encoder 15 encodes the voice sample data S8 that is echo-canceled and stored for several frames to generate a voice packet data S9 and outputs it to a modulator 16. The modulator 16 modulates the voice packet data S9 and outputs to an RF amplifier 17 as a transmission signal S10. The RF amplifier 17 amplifies the transmission signal S10 to a predetermined power level and transmits it from the antenna 2 via the multiplexer 3.
On the other hand, the voice data S7 converted to the digital data by the analog-to-digital converter 12 is subjected to the side-back-toning by output from the speaker 10 via the digital-to-analog converter 9. The side-back-toning causes that the user can hear his voice together with the other party's voice when talking from the speaker 10.
FIG. 4 shows a series of processing to perform the echo-canceling and the side-back-toning. Here the processing of one voice frame will be described.
In the mobile terminal device 1, the processing enters from the beginning step RT1 and proceeds to step SP1. At step SP1 in the mobile terminal device 1, when talking is started, each module of the echo canceler 14, voice encoder 15 and voice decoder 6, and the input buffer 13 and output buffer 8 are initialized.
At step SP2 in the mobile terminal device 1, the reception signal S1 received via the antenna 2, multiplexer 3 and RF amplifier 4, is extracted as the voice packet data S2 by the demodulator 5 and output to the voice decoder 6 of the DSP unit. At this time, the mobile terminal device 1 determines whether the voice decoder 6 receives the voice packet data S2.
When the voice decoder 6 does not receive the voice packet data S2, a negative result is obtained and the process returns to step SP2 again. This processing will be repeated until the voice decoder 6 receives the voice packet data S2. And when the voice decoder 6 received the voice packet data S2, an affirmative result is obtained and the process proceeds to step SP3.
At step SP3 in the mobile terminal device 1, the voice packet data S2 is decoded into the voice data S3 by the voice decoder 6, and the process proceeds to step SP4.
At step SP4 in the mobile terminal device 1, the voice sample data S7 which has been picked up by the microphone 11 when talking was started and converted via the analog-to-digital converter 12, is output to the speaker 10 and subjected to the side-back-toning processing. Here the side-back-toning is performed from the beginning of talking to the end of talking.
Then in the mobile terminal device 1, at step SP5 the voice data S3 decoded by the voice decoder 6 is once written to the buffer 7 and thereafter, the voice sample data S4 (output voice frame) is transferred to the output buffer 8 frame by frame. At this time, whether one frame of the voice sample data S7 written in the input buffer 13 has been written or not is determined by confirming the presence of the boundary between frames.
When the boundary cannot be confirmed, a negative result is obtained and this processing will be repeated until the voice sample data S7 can be perfectly transferred and the boundary can be confirmed. On the other hand, when the boundary can be confirmed, an affirmative result is obtained and the process proceeds to step SP6 determining that the voice sample data S7 of one frame (the input voice frame 1) can be perfectly transferred.
At step SP6 in the mobile terminal device 1, the delayed disturbance in the voice signal is eliminated from the one-frame voice sample data S7 (the input voice frame 1) based on the one-frame voice sample data S4 (the output voice frame 1) by the echo canceler 14 so as to rewrite in the input buffer 13 the new voice sample data S8. Then the process proceeds to step SP7.
In the mobile terminal device 1, at step SP7, the following one frame of the voice sample data S4 (the output voice frame 2) is transferred to the output buffer 8 from the buffer 7 to prepare to eliminate the delayed disturbance in the voice signal from the next input voice frame 2 and cancel the echo.
At step SP8, the voice sample data S8 (the input voice frame 1) that is echo-canceled later is encoded with the voice encoder 15 and the process proceeds to step SP9.
At step SP9, the mobile terminal device 1 determines whether the talking should be stopped or not. When the talking should not be stopped, a negative result is obtained and the process returns to step SP2 to perform the echo-canceling and the side-back-toning of the next voice frame. When the talking should be stopped, an affirmative result is obtained and the process proceeds to step SP10 and the processing is stopped.
As described above, in the mobile terminal device 1, the echo-canceling has been performed based on the voice sample data S4 (the output voice frame) having the same boundary between frames as the voice sample data S7 (the input voice frame).
By the way, in the mobile terminal device 1 having the above construction, when echo-canceling, the echo canceler 14 eliminates the delayed disturbance in the voice signal from the voice sample data S7 (e.g., the input voice frame 1) based on the voice sample data S7 transferred to the output buffer 8 (e.g., the output voice frame 1) and writes the new sample data S8.
However, since the output buffer 8 has the memory capacity only for one frame, when the buffer 7 transfers the voice data S3 to the output buffer 8 in succession, it is caused that the output voice frame 1 at the same time as the input voice frame 1 is written in the input buffer 13 is sent away from the output buffer 8 and the output voice frame 2 is transferred to the output buffer 8, thus the echo-canceling could not be done.
Therefore, when performing the echo-canceling, the mobile terminal device 1 must once write the voice data S3 decoded by the voice decoder 6 to the buffer 7 and transfer the voice sample data S4 for one frame (the output voice frame 1) at the timing that the voice sample data is written to the input buffer 13 (the input voice frame 1). There has been a problem that the processing has been complicated.
Furthermore, the mobile terminal device 1 sends the voice sample data S7 converted by the microphone 11 and the analog-to-digital converter 12 to the speaker 10 via the digital-to-analog converter 9 to perform the side-back-toning. Therefore, there has been a problem that the voice that is not echo-canceled is generated from the speaker 10 and the tone quantity is low.