The present invention relates to a telephone system, and especially, to a telephone system having a voice recognition function which is used under transmission of a compressed voice signal.
CTI (Computer Telephony Integration) is being used, which is a system in which insufficient functions are mutually supplemented by integrating a system of a telephone (a PBX and so forth) and a system of information processing (a computer, a LAN and so forth) that were conventionally constructed separately. It is applied to reception work of a telecommunication sale company, telephone banking in a financial agency and so forth, for example, and saving of work is promoted by substituting conventional reception with a customer by an operator for a voice recognition device and a voice response device.
On the other hand, in such CTI, conversation by telephone via a conventional analog circuit is being changed to conversation by telephone (an internet phone and so forth) via a LAN, an ISDN, an internet and so forth. Accordingly, voice transmission by means of a conventional analog signal is shifting to voice transmission by means of a digital signal, and further, in order to suppress increase of transmission volume, a compression technology of voice is being used.
However, in a system in which a voice compression technology is applied to a voice recognition device, various tasks occur. Accordingly, such conventional tasks will be explained.
With regard to a technology of voice transmission by means of a digital signal, a unified standard is advised by a telecommunication standardization department (ITU-T) of an International Telecommunication Union (ITU). Presently, there are some standards, such as G.711 (PCM (:Pulse Code Modulation), 64 kbits/sec), G726 (ADPCM (:Adaptive Differential PCM), 32 kbits/sec), G.728 (LD-CELP (:Low Delay Code Excited liner Prediction), 16 kbits/sec), G.729 (CS-ACELP (:Conjugate Structure Algebraic CELP), 8 kbits/sec), and G.723.1 (MP-MLQ/ACELP, 6.3 k/5.3 k bits/sec). Out of them, a hybrid coding method, such as the G728, G.729 and G.723.1, has a higher compression ratio than that of a waveform coding method such as the G.711, and is expected to be a promising coding method in future.
FIGS. 6A an 6B are an explanation view for explaining a waveform coding method and a hybrid coding method. As shown in FIG. 6A, the waveform coding method is a method of conducting coding by sampling and quantizing a voice waveform. Accordingly, although, if a bit rate more than a certain value exists, voice of high quality can be obtained, there are tasks that a compression ratio is lowered by maintaining a high bit rate, and also, that voice quality is remarkably deteriorated when the bit rate is lowered.
On the other hand, as shown in FIG. 6B, the hybrid coding method is a method in which two kinds of information are complexly used, which are normalized information that is a previously prepared basic waveform pattern, and sound source information that is a difference between a waveform made of this normalized information and an original voice waveform. The normalized information is information in which for example a bit sequence of three bits is associated with the basic waveform pattern, and is stored in code books that are set on a transmission side and a reception side. Also, the sound source information is information provided by coding a difference by means of PCM between the original voice waveform and a waveform in which a plurality of basic waveform patterns are superimposed and rough shape of a voice waveform is reproduced, and is a signal including specific information of voice of a speaker, a background noise and so forth. Accordingly, since in the hybrid coding method most of parts of a voice waveform are represented by normalized information of about three bits, a compression rate of the hybrid coding method is higher than that of the waveform coding method. Also, by adding the sound source information that is a difference from the original voice waveform, there is an advantage that characteristic of voice of a speaker can be exactly reproduced, and voice of high quality can be produced.
As a telephone system in which such a hybrid coding method is adopted, there is a system which is defined in H.323. The H.323 is a standard of a conference system that is associated with a packet exchange network such as a LAN and an internet, based on H.320 that is international standard advice by ITU-T. It is mainly used for a personal computer conference system and so forth, and regards real-time characteristic as important. The G.723.1 and G.729 are corresponding to the coding of voice, and H.261 and H.263 are corresponding to the coding of an image.
FIG. 7 is a block diagram showing a conventional telephone system based on the H.323. As shown in the figure, the conventional telephone system is constructed of a telephone set 200 that is set on a side of a person who utilizes service, and a telephone set 201 for providing automatic response service of voice. The telephone set 200 and the telephone set 201 are connected to each other via a network 202 such as an internet. Also, a gate keeper 202a for conducting call control between the telephone sets, address conversion, bandwidth control and so forth is connected to the network 202.
The telephone set 200 is constructed of a microphone 203, an A/D converter 204, an encoder 205, a packeting device 206, a network interface card (referred to as an NIC, hereinafter) 207, a receiving buffer 208, a depacketing device 209, a decoder 210, a D/A converter 211, a speaker 212, and a call controller 213.
The telephone set 201 is a telephone set having an automatic voice response function, and is constructed of an NIC 214, a receiving buffer 215, a depacketing device 216, a decoder 217, a D/A converter 218, a speaker 219, a voice recognition and response device 220, an encoder 221, a packeting device 222, and a call controller 223.
The operation of such a conventional telephone set is as follows:
First, when the telephone set 201 is phoned from the telephone set 200, call control is conducted by the call controllers 213 and 223, and setting of a call, and so forth are conducted. Thereafter, information notification in relation to a terminal function is mutually conducted between the telephone set 200 and the telephone set 201, and a channel in relation to voice is set.
When voice is input to the microphone 203, the telephone set 200 on a dialing side converts it into an analog electric signal, and thereafter, supplies it to the A/D converter 204. The A/D converter 204 converts the supplied analog electric signal into a digital signal, and thereafter, supplies it to the encoder 205. The encoder 205 encodes the supplied signal, and thereafter, supplies it to the packeting device 206. The packeting device 206 packets the supplied signal, and thereafter, supplies a packet signal to the NIC 207. The NIC 207 transmits the supplied packet signal to the telephone set 201 via the network 202.
When receiving the packet signal from the telephone set 200, the NIC 214 of the telephone set 201 successively stores it in the receiving buffer 215. The depacketing device 216 reads out the packet signal stored in the receiving buffer 215, and converts it into a signal prior to being packeted and supplies it to the decoder 217. The decoder 217 decodes the supplied signal, and supplies it to the D/A converter 218 or the voice recognition and response device 220. In case that the signal is supplied to the speaker 219 via the D/A converter 218, the voice sent from the telephone set 200 can be heard also on a side of the telephone set 201. Also, the voice can be sent via a microphone 225 and an A/D converter 224.
The voice recognition and response device 220 conducts voice recognition of the signal supplied from the decoder 217, and makes a predetermined response. The voice recognition and response device outputs a synthetic sound (a digital signal) for example, in accordance with the voice recognition. The encoder 221 encodes the synthetic sound supplied from the voice recognition and response device 220, and supplies it to the packeting device 222. The packeting device 222 packets the supplied signal, and thereafter, supplied it to the NIC 214. The NIC 214 transmits the supplied packet signal to the telephone set 200 via the network 202.
The telephone set 200 that has received the packet signal from the telephone set 201 receives the packet signal by means of the NIC 207, and successively stores it in the receiving buffer 208. The depacketing device 209 reads out the packet signal stored in the receiving buffer 208, and converts it into a signal prior to being packeted. The decoder 210 decodes the supplied signal, and supplies it to the D/A converter 211. The D/A converter 211 converts the supplied signal into an analog electric signal, and thereafter, supplies it to the speaker 212, and thereby, a user can hear voice from the telephone set 201 via the speaker 212.
FIG. 8 is a block diagram showing a relation between the encoder 205 and the decoder 217. As shown in this figure, the encoder 205 includes a code analysis circuit 205a, a code book 205b, and a difference detecting circuit 205c. The decoder 217 includes a normalized information regenerating circuit 217a, a code book 217b, and a sound source information regenerating circuit 217c, and an adder 217d. 
Accordingly, digital voice that has been input to the encoder 205 is analyzed in the code analysis circuit 205a, and a code is selected from the code book 205b, waveform shape of which is most closely akin to that of the digital voice. In the code books 205b and 217b, codes in which a bit sequence of three bits is associated with a basic waveform pattern are stored. The selected signal of three bits is output as a normalized signal. Also, the difference detecting circuit 205c calculates a difference between the digital voice signal and the normalized signal, and outputs the obtained difference as a difference signal. The normalized signal and the difference signal are input to the decoder 217. The normalized information regenerating circuit 217a reads out a waveform corresponding to the normalized signal from the code book 217b and outputs it, and the sound source information regenerating circuit 217c decodes the difference signal, and thereafter, outputs it. The adder 217d adds outputs from the normalized information regenerating circuit 217a and the sound source information regenerating circuit 217c to each other, and thereafter, outputs an added result.
In this manner, the conventional telephone system can reduce a data size by converting most of parts of the voice signal into the normalized signal. Also, by transmitting the difference signal, it is possible to send specific information of voice of a speaker, and there is an advantage that identification of the speaker on a reception side is facilitated.
However, in the telephone system in which such a hybrid coding system is used, there is a task that the voice recognition is made difficult. In other words, although the identification of a speaker becomes easy by using the difference signal, and natural voice which provides presence more is obtained since a background noise and so forth are added, such a difference signal is only a noise that hinders the voice recognition. Accordingly, a recognition rate is reduced by the addition of the difference signal, and there is a task that quality of services is deteriorated.
The present invention is for solving such tasks, and has the objective that a telephone system applicable to CTI work is provided, in which compressed voice can be certainly recognized.
In order to accomplish such an objective, in a telephone system, which is related to the present invention, for compressing a voice signal using a hybrid coding method and transmitting it, and including a telephone set on a dialing side and a telephone set on a termination side, which having a voice recognition function, the above-described telephone set on a dialing side includes a voice channel control circuit for, in case that the above-described telephone set on a termination side has a voice recognition function, setting a channel to which a voice signal is sent when the above-described telephone set on a termination side is phoned from the above-described telephone set on a dialing side to a channel for transmitting only a normalized signal in the above-described hybrid coding method.
The objective of the present invention is achieved by a telephone method for compressing a voice signal using a hybrid coding method and transmitting it, and including a telephone set on a dialing side and a telephone set on a termination side, which having a voice recognition function, characterized in that said telephone set on a dialing side, in case that said telephone set on a termination side has a voice recognition function, set a channel to which a voice signal is sent when said telephone set on a termination side is phoned from said telephone set on a dialing side to a channel for transmitting only a normalized signal in said hybrid coding method.
Also, other form shown below is included in the present invention.
In other words, the above-described telephone set on a termination side may include a voice response device for conducting a response by means of voice.
Also, the above-described telephone set on a termination side may be any one of a voice modem, a facsimile device which includes an automatic voice response function, a CTI server and an internet telephone gateway device.
Also, the above-described telephone system may be used for an internet telephone.
Also, the above-described hybrid coding method may be G.728 or G.729.
Also, the above-described encoder may include a code book in which a predetermined basic waveform pattern is stored, a code analysis circuit for analyzing a digital voice signal that has been input, and generating a normalized signal by referring to the above-described basic waveform pattern, and outputting it, and a difference detecting circuit for outputting a difference signal between the above-described digital voice signal and the above-described normalized signal, and the above-described decoder may include a code book in which a basic waveform pattern same as the code book in the above-described encoder, a normalized information regenerating circuit for decoding the above-described normalized signal that has been input by referring to the above-described basic waveform pattern, and outputting it, a sound source information regenerating circuit for decoding the above-described difference signal and outputting it, and an adder for adding outputs from the above-described normalized information regenerating circuit and the above-described sound source information regenerating circuit to each other and outputting it.
Furthermore, the above-described telephone set on a dialing side and the above-described telephone set on a termination side may transmit and receive information in relation to a terminal function to and from each other after call control is conducted therebetween, and the above-described telephone set on a dialing side may transmit a channel open demand of only the normalized signal to the above-described telephone set on a termination side, and when receiving the above-described channel open demand, the above-described telephone set on a termination side may send back channel open confirmation and transmit a channel open demand of a normalized signal and a difference signal to the above-described telephone set on a dialing side, and when receiving the above-described channel open demand, the above-described telephone set on a dialing side may send back channel open confirmation to the above-described telephone set on a termination side.