This invention relates to a method for communicating speech data to a speech synthesis circuit and preferably from a memory device.
Several techniques are known in the prior art for digitizing human speech. For example, pulse code modulation, differential pulse code modulation, adaptive predictive coding, data modulation, channel vocoders, cepstrum, vocoders, format vocoders, voice excited vocoders and linear predictive coding techniques of speech digitalization are known. These techniques are briefly explained in "Voice Signals: Bit by Bit" on pages 28-34 of the October 1973 issue of IEEE Spectrum.
Certain applications and particularly those in which the digitized speech is to be stored in a memory tend to use the Linear Predictive Coding technique because it produces very high quality speech using rather low data rates. Linear Predictive Coding systems usually make use of a multi-stage digital filter. In the past, such digital filters have typically been implemented by appropriately programming large scale digital computers. However, in U.S. Patent application Ser. No. 807,461, filed 6/17/77, there is taught a particularly useful digital filter for a speech synthesis circuit, which digital filter may be implemented on an integrated circuit using standard MOS or equivalent technology. A theoretical discussion of Linear Predictive Coding can be found in "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave" at Volume 50, number 2 (part 2) of The Journal of the Acoustical Society of America.
Disclosed herein is a talking learning aid which utilizes speech synthesis technology for producing human speech. A complete talking learning aid is disclosed, so, in addition to describing the speech synthesis circuits in detail, this patent also discloses the details of the learning aid's controller and the Read-Only-Memory devices used to store the digitized speech. Of course, those practicing the present invention may wish to practice the invention in conjunction with a talking learning aid, such as that described herein, other learning aids or any other application wherein the generation of human speech from digital data is desirable. Using the techniques described in the aforementioned U.S. Patent application Ser. No. 807,461 and the teachings of this patent permit those desiring to make use of digital speech technology to do so with one, or a small number of relatively inexpensive integrated circuit devices.
As aforementioned, Linear Predictive Coding permits the synthesis of human speech from digital data having a relatively low data rate. For example, using ten bit parameters for speech energy, pitch and ten filter coefficients and by updating these twelve parameters fifty timer per second yields very high quality speech and only at a bit rate of 6000 bits per second (bps). Thus, if this data were stored in a 131 K bit Read-Only-Memory, then 21.8 seconds of human speech would be stored therein. However, if the bit rate could be dropped to an average of 1000-1200 bps then that 131 K bit Read-Only-Memory could store 109 to 131 seconds of spoken speech. The advantages of being able to lower the bit rate while maintaining the quality of the resulting speech are obvious since less Read-Only-Memory capacity is required to store a given amount of speech. Thus, the primary objective of this invention was to effect a reduction of the bit rate without unduly affecting quality.
The foregoing objects are achieved as is now described. Variable frame length data is provided to the speech synthesis circuit. Preferably, a full length frame includes a pitch parameter, an energy parameter, a repeat bit and a plurality of speech coefficients. Each parameter is encoded according to a preselected coding scheme and has a preselected length, the encoded parameter being longer as its significance in speech synthesis increases. A particular encoded pitch parameter is used to signify that the speech is unvoiced. An unvoiced frame includes fewer speech coefficients than does a regular voiced frame. A repeat bit preferably is used to signify that the frame contains pitch and energy parameters, but no speech coefficients. Thus, the frames may include all, a few or none of the speech coefficients. To further reduce the data rate, special encoded energy parameters preferably indicate that a frame is either a pause or the last frame sent.
The frames of speech data are preferably stored in a memory, such as a Read-Only-Memory. The data is communicated to the speech synthesis circuit in response to certain control signals applied as the Read-Only-Memory. In the disclosed embodiment, these control signals are generated by circuits disposed upon the same integrated circuit that the speech synthesis circuits are disposed.