1. Field of the Invention
This relates to speech analysis systems and, more particularly, to systems for analyzing connected speech and for automatically recognizing its context.
2. Description of the Prior Art
Automatic speech recognition has long been a goal of speech researchers although, until the computer-age explosion, very little was done in the area. In the past few years considerable effort has been expended in this field, resulting in the realization of various systems which permit one, for the first time, to "talk" directly to a computer.
One major obstacle to progress in the field of automatic speech rcognition has been the great variation in speech characteristics between individuals; particularly between men, women, and children. To circumvent this obstacle, some researchers have chosen to develop systems tailored or adapted to a particular speaker, while others have chosen to develop "universal speaker" systems that are capable of responding to any speaker, but recognize, only a limited vocabulary.
One system in the latter class has been described by T. R. Martin in "Acoustic Recognition of a Limited Vocabulary in Continuous Speech," University of Pennsylvania, Ph. D. Thesis, 1970. Martin describes a system which recognizes a limited vocabulary by abstracting particular features from the speech signal and by matching the derived sequence of features to a preselected set of feature sequences that represent the vocabulary sought to be recognized. The features selected by Martin for abstracting are characteristic of the elemental sounds in speech. He distinguishes three characterization levels of such features. The first level represents the broad class features, the second level subdivides the class features into less broad categories, and the third level -- which Martin does not employ in his speech recognition apparatus -- comprises the actual phonemes of the speech.
To abstract the features employed, Martin computes the area of spectrum rise and fall in the speech and the formants contained therein. To do so, he divides the speech spectrum into a plurality of contiguous bands and detects the energy contained in each band. The presence of various features is determined by logic circuitry which is made appropriately responsive to output signals of the various bands.
In the area of physiological study of speech, R. Houde has investigated tongue body motions during speech. In "A Study of Tongue Body Motion During Selected Speech Sounds," University of Michigan, Ph. D. Thesis, 1967, Houde reported that the tongue body trajectories of different speakers pronoucing the same utterance, e.g., / i'gugi /, are quite similar; in particular, with respect to the target position of the tongue movement.
Also in the area of physiological study of speech, C. H. Coker has developed a physical model of the vocal tract which is capable of being controllably altered to produce various signal formant sets characteristic of human speech. In particular, for each vocal tract length and tongue body position. Coker's model generates a set of formants which characterizes the sound that would be generated by a human speaker. This model has successfully been employed by Coker to synthesize speech, as is described in "A Model of Articulatory Dynamics and Control," Processing of the IEEE, Vol. 64, No. 4, 1976. This model is also described in U.S. Pat. No. 3,530,248 issued to C. Coker on Sept. 22, 1970.