1. Technical Field of the Invention
The present invention relates to a method for carrying out information transmission by using speech sounds on a portable telephone, Internet or the like.
2. Description of the Related Art
Speech sound communication systems are constructed by connecting transmitters and receivers via wire communication paths such as coaxial cables or radio communication paths such as electromagnetic waves. Though, in the past analog communications were the mainstream where acoustic signals are propagated directly or by being modulated into carrier waves on those communication paths, digital communications have been becoming mainstream where acoustic signals are propagated after being coded once for the purpose of increasing communication-quality with respect to anti-noise properties or distortion and increasing the number of communication channels.
Recent communications systems, such as portable telephones, use the CELP (Schroeder M. R. and Atal B. S.: xe2x80x9cCode-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates,xe2x80x9d Pros. IEEE ICASSP ""85, 25.1.1, (April 1985)) system to correct the deficiencies of transmission radio wave bands caused by the rapid spread of such communications systems.
FIG. 7 shows an exemplary configuration example of the CELP speech coding and decoding system.
The processing on the coding end, that is, on the transmission terminals end is as follows. Speech sound signals are processed by partition into frames of, for example, 10 ms or the like. The inputted speech sounds undergo LPC (Linear Prediction Coding) analysis at the LPC analysis part 200 to be converted to a LPC coefficient xcex1i representing a vocal tract transmission function.
The LPC coefficient xcex1i is converted and quantized to a LSP (Line Spectrum Pair) coefficient xcex1qi at an LSP parameter quantization part 201. xcex1qi is given to a synthesizing filter 202 to synthesize a speech sound wave form by a voicing wave form source read out from an adaptive code book 203 corresponding to a code number ca. The speech sound wave form is inputted as a periodic wave form in accordance with a pitch period T0 calculated out by using an auto-correlation method or the like in parallel with the previous processing.
The synthesized speech sound wave form is subtracted from the inputted speech sound to be inputted into a distortion calculation part 207 via an auditory weighting filter 206. The distortion calculation part 207 calculates out the energy of the difference between the synthetic wave form and the inputted wave form repetitively while changing the code number ca for the adaptive code book 203 and determines the code number ca that makes the energy value the minimum.
Then the voicing source wave form read out under the determined ca and the noise source wave form read out according to the code number cr from the noise code book 204 are added to determine the code number cr that makes the distortion minimum following similar processing. The gain values are also determined which are to be added to both voicing source and noise source wave forms through-the previously accomplished processing so that the most suitable gain vector corresponding to them is selected from the gain code book to determine the code number cg.
The LSP coefficient xcex1qi, the pitch period T0, the adaptive code number ca, the noise code number cr, the gain code number cg which have been determined as described above are collected into one data series to be transmitted on the communication path.
On the other hand, the processing on the decoding end, that is, on the reception terminal end, is as follows.
The data series received from the communication path is again divided into the LSP coefficient xcex1qi, the pitch period T0, the adaptive code number ca, the noise code number cr, and the gain code number cg. The periodic voicing source is read out from the adaptive code book 208 in accordance with the pitch period T0 and the adaptive code number ca, and the noise source wave form is read out from the noise code book 209 in accordance with the noise code number cr.
Each voicing source receives an amplitude adjustment by the gain represented by the gain vector read out from the gain code book 210 in accordance with the gain code number cg to be inputted into the synthesizing filter 211. The synthesizing filter 211 synthesizes speech sound in accordance with the LSP coefficient xcex1qi.
The speech sound communication system as described above has the main purpose of propagating speech sound efficiently with a limited communication path capacitance by compression coding inputted speech sound. That is to say the communication object is solely speech sound emitted by human beings.
Today""s communications services, however, are not limited to only speech sound communications between human beings in distant locations but services such as e-mail or short messages are becoming widely used where data are transmitted to a remote reception terminal by inputting text utilizing transmission terminals. And it has become important to provide speech sound from apparatuses to human beings such as those supplying a variety of information by speech sound represented by the CTI (Computer Telephony Integration) or providing operating methods of the apparatuses in speech sound. Moreover, by using the speech sound rule synthesizing technology which converts text information into speech sound it has become possible to listen to the contents of e-mails, news or the like on the phone, which has been attracting attention recently.
In this way it has been required to have a communication service form to convert text information into speech sound. The following two forms are considered as methods to implement those services.
One is a method for transmitting speech sound synthesized on the. service supplying end to the users by using normal speech sound transmissions. In the case of this method the terminal apparatuses on the reception end only receive and reproduce the speech sound signals in the same way as the prior art and common hardware can be used.
Vocalizing a large amount of text, however, means to keep speech sounds flowing for a long period of time into the communication path and in the case of using communication systems such as portable telephones it becomes necessary to maintain the connection for a long period of time. Accordingly, there is the problem that communication charges becomes too expensive.
The other is a method for letting the users hear the speech sound converted by a speech sound synthesizing apparatus of the reception terminals after the information is transmitted on the communication path in the form of text. In the case of this method the information transmission amount is an extremely small amount such as one several hundredths of a speech sound which makes it possible to be transmitted in a very short period of time. Accordingly, the communication charges are held low and it becomes possible for the user to listen to the information by conversion into speech sounds whenever desired if the text is stored in the reception terminal. There is also an advantage that different types of voices such as male or female, speech rates, high pitch or low pitch or the like can be selected at the time of conversion to speech sounds.
The speech sound synthesizing apparatus to be installed as a terminal apparatus on the reception end, however, has different circuits from that used as an ordinary reception terminal such as a portable telephone, therefore, new circuits for synthesizing speech sounds should be mounted, which leads to the problem that the circuit scale is increased and the cost for the terminal apparatus is increased.
Considering such a conventional problem of the communication method, it is the purpose of the present invention to provide a speech sound communication system which has a smaller communication burden and has a simpler speech synthesizing apparatus on the reception end.
To solve the above described problems the present invention provides a speech sound communication apparatus.
One aspect of the present invention is a speech sound communication system comprising;
a transmission part having a text input means and a transmission means;
a reception part having a reception means, a language analysis means, a prosody generation means, an segment data memory means, an segment read-out means and a synthesizing means,
wherein, said text input means inputs text information;
said transmission means transmits said text information to a communication path;
said reception means receives said text information from said communication path;
said language analysis means analyses said text information so that said text information is converted to phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription with prosody information on which the prosody information is added;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing part synthesizes speech sound by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
Another aspect of the present invention is a speech sound communication system comprising a transmission part having a text input means, a language analysis means and a transmission means as well as a reception part having a reception means, a prosody generation means, an segment data memory means, an segment read-out means and a synthesizing means,
wherein, said text input means inputs text information;
said language analysis means converts said text information into phonetic transcription information;
said transmission means transmits said phonetic transcription information into a communication path;
said reception means receives said phonetic transcription information from said communication path;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said segment readout means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing means synthesizes speech sound by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
Still another aspect of the present invention is a spech sound communication system comprising a transmission part having a text input means, a language analysis means, a prosody generation means and a transmission means as well as a reception part having a reception means, an segment data memory means, an segment read-out means and a synthesizing means,
wherein, said text input means inputs text information;
said language analysis means converts said text information into phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said transmission means transmits said phonetic transcription information with prosody information into a communication path;
said reception means receives said phonetic transcription information with prosody information from said communication path;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes a speech sound by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing part synthesizes speech sound by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
Yet another aspect of the present invention is a speech sound communication system comprising:
a transmission part having a text input means and a first transmission means;
a repeater part having a first reception means, a language analysis means and a second transmission means; and
a reception part having a second reception means, a prosody generation means, an segment data memory means, an segment read-out means and a synthesizing means;
wherein, said text input means inputs text information;
said first transmission means transmits said text information to a first communication path;
said first reception means receives said text information from said first communication path;
said language analysis means converts said text information into phonetic transcription information;
said second transmission means transmits said phonetic transcription information into a second communication path;
said second reception means receives said phonetic transcription information from said second communication path;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing means synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said sound characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
Still yet another aspect of the present invention is a speech sound communication system comprising:
a transmission part having a text input means and a first transmission means;
a repeater part having a first reception means, a language analysis means, a prosody generation means and a second transmission means; and
a reception part having a second reception means, an segment data memory means, an segment read-out means and a synthesizing means;
wherein, said text input means inputs text information;
said first transmission means transmits said text information to a first communication path;
said first reception means receives said text information from said first communication path;
said language analysis means converts said text information into phonetic transcription information;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said second transmission part transmits said phonetic transcription information with prosody information into a second communication path;
said second reception part receives said phonetic transcription information with prosody information from said second communication path;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by utilizing said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores voicing source characteristics and vocal tract transmission characteristics information; and
said synthesizing part synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.
A further aspect of the present invention is a speech sound communications system comprising a transmission part having a text input means, a language analysis means and a first transmission means, a repeater part having a first reception means, prosody generation means and second transmission means and a reception part having a second reception means, an segment data memory means, an segment read-out means and a synthesizing means,
wherein, said text input means inputs text information;
said language analysis means converts said text information into phonetic transcription information;
said first transmission means transmits said phonetic transcription information into a first communication path;
said first reception means receives phonetic transcription information from said first communication path;
said prosody generation means converts said phonetic transcription information into phonetic transcription information with prosody information;
said second transmission means transmits said phonetic transcription information with prosody information to a second communication path;
said second reception means receives said phonetic transcription information with prosody information from said second communication path;
said segment read-out means reads out segment data from said segment data memory means in accordance with said phonetic transcription information with prosody information;
said synthesizing means synthesizes speech sounds by using said phonetic transcription information with prosody information and said segment data;
said segment data memory means stores the voicing source characteristics and the vocal tract transmission characteristics information; and
said synthesizing part synthesizes speech sounds by generating a voicing source wave form having a period in accordance with said prosody information and having characteristics in accordance with said voicing source characteristics and by filter processing said voicing source wave form in accordance with said vocal tract transmission characteristics information.