1. Field of the Invention
This invention relates to automatic discrete utterance voice recognition systems, and particularly relates to an adaptive automatic discrete utterance recognition system which requires only a prototype vocabulary set to be established by multiple repetition techniques and permits subsequent talkers to interact with the voice recognition system in adaptive mode in which mode the new talker is required merely to retrain the system for that limited subset of the vocabulary set in which the system cannot perform recognition.
2. Description of the Prior Art
Extant automatic discrete utterance voice recognition systems involve separate procedures for utterance prototype establishment in which multiple repetitions of each vocabulary item are taken from each talker. If a single repetition is taken the probability exists that the entire vocabulary prototype set will require re-establishment if an inappropriate prototype representation occurs.
Typical of such procedures--each talker repeating the prototype word list--are the following:
U.S. Pat. No. 3,333,248, Greenberg et al, SELF-ADAPTIVE SYSTEMS, July 25, 1967. Greenberg et al shows a self-adaptive pattern recognizer which, after initial training, may be switched to the operate mode and remains in operate mode until a character is presented which results in a reject signal. At this time the operator must assist by placing the identification switch in the position which corresponds to the rejected pattern, and the operator must determine whether the rejected pattern is a slightly modified one of the initial sample patterns or a new sample pattern and must set the appropriate switches including the switch to begin the training mode. In order to effectively update the self-adaptive circuit, a sufficient number of the initial sample patterns must be stored and represented to the self-adaptive circuit along with the rejected pattern.
U.S. Pat. No. 3,369,077, French et al, PITCH MODIFICATION OF AUDIO WAVEFORMS, Feb. 13, 1968. French et al shows a speech synthesizing system wherein pitch periods are adjusted according to a predetermined time base.
U.S. Pat. No. 3,440,617, Lesti et al, SIGNAL RESPONSIVE SYSTEMS, Apr. 22, 1969. Lesti et al shows a technique for recognition, independent of amplitude and duration of the signals to be recognized, by segmenting the signal into a series of component signals. The system extrapolates and interpolates inputs which it has never before received to the response which most closely resembles the signal. Data might be lost when it becomes replaced by new data. Lesti et al shows a technique in which newly coded samples are not discarded when the transmit buffer portion is already full but rather the oldest untransmitted coded sample is discarded to make room for storage of the new sample.
U.S. Pat. No. 3,665,450, Leban, METHOD AND MEANS FOR ENCODING AND DECODING IDEOGRAPHIC CHARACTERS, May 23, 1972. Leban shows a technique for handling ideographic characters.
U.S. Pat. No. 3,718,768, Abramson et al, VOICE OR ANALOG COMMUNICATION SYSTEM EMPLOYING ADAPTIVE ENCODING TECHNIQUES, Feb. 27, 1973. Abramson et al shows a technique for transmitting communications to remote stations which can detect their own identification signals and have their own sampling rates.
U.S. Pat. No. 4,069,393, Martin et al, WORD RECOGNITION APPARATUS AND METHOD, Jan. 17, 1978. Martin et al shows a technique for time normalizing training words and words for recognition. Martin et al deals with spoken input training words and generates a correlation function, and with feature extraction. During the training mode, the equipment is trained with new vocabulary words, preferably spoken by the person who is to later use the machine. It is desirable to use multiple samples of the same training word to obtain a faithful average sample.
U.S. Pat. No. 4,092,493, Rabiner et al, SPEECH RECOGNITION SYSTEM, May 30, 1978. Rabiner et al shows a speech recognition system in which test signals are time aligned to the average voiced interval of repetitions of each speech segment having a previously generated voiced interval linear prediction characteristic.
U.S. Pat. No. 4,297,528, Beno, TRAINING CIRCUIT FOR AUDIO SIGNAL RECOGNITION COMPUTER, Oct. 27, 1981. Beno shows a training circuit technique in which each training pattern, to be accepted for merging, must match the previously merged patterns by a threshold amount. The threshold is automatically varied as the number of previously merged training patterns increases.
C. C. Tappert, A PRELIMINARY INVESTIGATION OF ADAPTIVE CONTROL IN THE INTERACTION BETWEEN SEGMENTATION AND SEGMENT CLASSIFICATION IN AUTOMATIC RECOGNITION OF CONTINUOUS SPEECH, IEEE Trans. on Systems, Man, and Cybernetics, Vol. SMC2, No. 1, 1/72, pp. 6672. Tappert shows feedback control of the interaction of segmentation and segment classification in continuous speech recognition.
C. C. Tappert, et al, APPLICATION OF SEQUENTIAL DECODING FOR CONVERTING PHONETIC TO GRAPHIC REPRESENTATION IN AUTOMATIC RECOGNITION OF CONTINUOUS SPEECH (ARCS), IEEE Trans. on Audio and Electroacoustics, Vol. Au-21, No. 3, 6/73, pp. 225228. Tappert et al shows conversion of machine-contaminated phonetic descriptions of speaker performance into standard orthography. Distinction is made between speaker-and-machine-dependent corruption of phonetic input strings.