1. Field of the Invention
The present invention relates to a speech synthesizer and speech synthesis technique. More specifically, the present invention relates to a speech synthesizer and operating method that produces an improved, robust sound through utilization of wavetable synthesis techniques.
2. Description of the Related Art
Speech synthesis is the computer generation of sound that resembles human speech. Speech synthesizers have evolved from systems that store and replay speech sounds in the form of simple phonics to more elemental common particles of sound to sound bites including words and phrases. What is common among digital speech processing systems throughout this evolution is the playback of fundamentally flawed speech with lifeless, monotonic sounds that are unnaturally stilted and formal through repetitious playback of a limited library of sounds.
Speech synthesis is accomplished using a speech synthesizer operating on stored sounds and algorithms. The speech synthesizer is a device that converts a numerical code representing a digital speech signal into recognizable speech sounds. The digital speech signal is sampled and recorded speech which is divided into small sound units. The small sound units have characteristics such as pitch, loudness and timbre that are represented as excitation and filter parameter numbers which become a digital code representing speech. Human speech sounds are stored, generally in ROM, EPROM, RAM, CD, or disk memory or are created by a program, and then generated from the stored digital code by excitation of a time-varying digital filter and played over a loudspeaker. A processor supplies overall control of speech production. The process of speech production is typically a digital process up to the point of an analog-to-digital converter, which supplies an analog signal to drive a speaker.
An alternative to the time-varying filter approach is a speech generation system which stores digitized speech data signals, samples the speech data at a constant rate such as an 8 kHz rate, interpolates the data for example to a 100 kHz rate.
In a further alternative embodiment, logarithmically compressed amplitude data are used which are analogous to the data processed by digital telephone systems and result in a data rate of 64 kbits/second with very good sound quality. The time-varying filter techniques supply acceptable speech quality but at a much lower digital input data rate. For example, average rates down to about 1200 bits/second for a ten-pole filter derived from a linear production model of speech. The low data rates for speech generation are possible due to the redundancy in speech and by using a simplified simulator of the human speech-generating system. The vocal tract is simulated by a dozen or so connected pipes of different diameter, and the excitation represented by a pulse stream at the vocal-chord rate for voiced sound or a random noise source for the unvoiced parts of speech. The reflection coefficients at the junctions of the pipes are obtained from a linear prediction analysis of the speech waveform.
The synthesis techniques for synthesizing speech sounds are substantially different from the synthesis techniques which have been developed to synthesize music. Some music synthesis techniques attempt to mimic the acoustical characteristics of an actual musical instrument. Other techniques generate musical sounds based on mathematical analysis and relationships.
One type of synthesis for generating musical sounds is called subtractive synthesis. Subtractive synthesis closely imitates the physical basis of sound generation inherent in acoustic musical instruments. A harmonic-rich periodic signal is generated that contains energy at every partial frequency existing in the sound to be produced. Specific selected frequency components are selectively altered using filters. The filters subtract unwanted frequencies. Electronic filters also supply a frequency-dependent gain so that selected frequencies are enhanced. Subtractive synthesis employs an envelope generator such as a voltage-controlled amplifier or analog multiplier to selectively alter the frequency components of the sound. Subtractive synthesis generates musical sounds in a manner analogous to an actual acoustic instrument so that the physics of the functional basis of the instrument serve as a model for designing the subtractive synthesis technique. Subtractive synthesis using digital techniques is relatively difficult and complex since substantial computations are necessary to generate a harmonic-rich signal that is properly band-limited.
Additive synthesis is a musical synthesis technique in which each partial frequency is generated separately, arbitrarily and independently. The separate partial frequencies are added to form a music signal. Each partial frequency is an integer multiple of the fundamental frequency of the sound to be generated. Additive synthesis functions by providing a plurality of separate oscillators, each of which generally forms a sine wave, and combining the separate sine waves to form a signal that sounds as close as possible to a particular sound.
A further music synthesis method is wavetable synthesis. Wavetable synthesis is a method of generating sound by playing back digitally stored samples. Real musical sounds, performed by actual musical instruments, are sampled and stored in a digital recording format in a storage such as a read-only memory (ROM). The digital sound recordings are sampled and mapped to accurately reproduce the acoustic range of the instrument.
In wavetable synthesis, a sample is a recorded sound stored in a digital data form. An instrument is a selectable entry which defines a particular type of sound corresponding to the sound produced by a specific musical instrument. A wave is a sample or group of samples that are used to reproduce the sound of an instrument over an entire range of frequencies. Instruments are either single-sampled or multi-sampled depending on the timbral characteristics of the corresponding musical instrument, sampling characteristics of the data and sampling system, and playback characteristics of the data and playback system. Some instruments, a flute for example, are typically single-sampled. Other instruments, such as a piano, have a more complex data structure and are nearly always sampled, stored, and played in multiple samples. A program is a set of parameters that are selected to completely define a wavetable synthesizers generation of a particular sound.
Wavetable synthesis may be practiced by sampling and playing back a virtually limitless amount of data. However, system performance, circuit and memory size, and cost are advantageously reduced through many data reduction techniques. One such data reduction technique is termed "looping". Musical sounds are highly sustained and highly repetitive. Looping exploits the sustained and repetitive nature of sound by playing back a section of a sample repeatedly. Different types of looping are typically supported, including forward looping, reverse looping, bi-directional looping and the like.
Conventional computer-generated speech devices create sounds that are unnaturally stilted and formal due to the repetitious usage of a limited library of sound elements. What is needed is a speech synthesis apparatus and technique that improves the sound of computer-generated speech. What is further needed is a speech synthesis device that generates an interesting, robust-sounding speech.