The present invention falls in the category of improvements to low data rate speech apparatuses and may be employed in electronic learning aids, electronic games, computers and small appliances. The problem of low data rate speech apparatuses is to provide electronically produced synthetic speech of modest quality while retaining a low data rate. This low data rate is required in order to reduce the amount of memory needed to store the desired speech or in order to reduce the amount of information which must be transmitted in order to specify the desired speech.
Previous solutions to the problem of providing acceptable quality of low data rate speech have employed the technique of storing or transmitting data indicative of the string of phonological linguistic units corresponding to the desired speech. The speech synthesis apparatus would include a memory for storing speech synthesis parameters corresponding to each of these phonological linguistic units. Upon reception of the string of phonological linguistic units, either by recall from a phrase memory or by data transmission, the speech synthesis apparatus would successively recall the speech synthesis parameters corresponding to each phonological linguistic unit indicated, generate the speech corresponding to that unit and repeat. This technique has the advantage that the phonetic memory thus employed need only include the speech parameters for each phonological linguistic unit once, although such phonological linguistic unit may be employed many times in production of a single phrase. The amount of data required to specify one of these phonological linguistic units from among the phonetic library is much less than that required to specify the speech parameters for generation of that particular phonological linguistic unit. Therefore, whether the phrase specifying data is stored in an additional memory or transmitted to the apparatus, an advantageous reduction in the data rate is thus achieved.
This technique has a problem in that the naturalness and intelligibility of the speech thus produced is of a low quality. By recall of speech synthesis parameters corresponding to individual phonological linguistic units occurring in the phrase to be spoken rather than storing the speech synthesis parameters corresponding directly to that phrase, the natural intonation contour of the speech is destroyed. This has the disadvantage of reducing the naturalness and intelligibility of the speech. The naturalness and intelligibility and hence the quality of the speech thus produced may be increased by storing or transmitting an indication of the original, natural intonation contour for intonation control upon synthesis. Storage or transmission of an indication of the natural intonation contour increases the data rate required for specification of a particular phrase or word. Thus, it is highly advantageous to provide a manner of specifying the natural intonation contour at a low bit rate. By combining the technique of specifying phonological linguistic units together with a coded form of the natural intonation contour, a low data rate speech system may be achieved having the required speech quality.