1. Field of the Invention
The present invention relates generally to a speaker adapted speech recognition system for recognizing an unknown speaker. More specifically, the invention relates to a speaker adapted speech recognition system that can realize a high recognition rate.
2. Description of the Related Art
A speech recognition system is typically adapted to manage correspondence between a spectral pattern of speech and the content of speech in order to implement speech recognition, by identifying the contents of speech represented by the spectral patterns of the speech when speech is input. With such a construction, it is possible to implement the system for speech recognition for speaker dependent speech recognition. However, at present, the systems for recognizing speakers independent speech recognition are not practically useful because of a low recognition rate. Recently, a speaker adapted speech recognition system that is adapted to modify management data of the correspondence between the spectral pattern and the content of speech depending upon the unknown speaker in order to implement speech recognition of speakers independent speech recognition, has been developed. In such a speaker adapted speech recognition system, it is necessary to make it possible to modify the management data of the correspondence between the spectral pattern and the content of the speech of the unknown speaker.
In one of the typical prior art approaches, a plurality of data of mutually different speakers are stored as acoustic templates of the speakers. When the speech input is given by the unknown speaker, the spectral pattern of the speech of the unknown speaker is checked against the acoustic templates for selecting one of the templates having the closest spectral pattern for speech recognition.
In such a case, a sufficient number of variations of the spectral patterns have to be preliminarily stored for achieving a satisfactorily high recognition rate. This clearly requires a large memory capacity for storing a large number of the acoustic templates of the speakers.
In another approach, a sole standard acoustic template is provided. The management data of the standard acoustic template is modified for adapting the spectral pattern thereof to the speech input to be recognized and enhancing the recognition rate. For this purpose, a neural network is employed for learning an association factor between neurons so as to achieve an adaptive modification of the management data.
Even in the latter approach, in order to cover a variety of the speech characteristics of the speech inputs, it is necessary to have a neural network of sufficient size. This, in turn, requires substantial learning capacity to enable the neural network to appropriately determine the modification of the management data and achieve a satisfactory recognition rate.
The documents regarding the prior art are, for example as follows:
1. Japanese Unexamined Patent Publication (Kokai) No. 59-180596 PA1 2. Japanese Unexamined Patent Publication (Kokai) No. 01-291298 PA1 a plurality of acoustic templates of speakers for managing correspondence between an acoustic feature of the speech and a content of the speech; PA1 a converting portion for converting the acoustic feature of the speech managed by the acoustic templates according to a set parameter; PA1 a learning portion for learning the parameter at which the acoustic feature of the acoustic template, as converted by the converting portion, is approximately coincidence with the acoustic feature of a corresponding speech input for learning, when the speech input for learning is provided; PA1 a selection portion for selecting one or more of the acoustic templates having the closest acoustic features to that of a speech input for selection; the acoustic features being converted by the converting portion by comparing the corresponding acoustic feature of the speech input for selection with the corresponding acoustic features converted by the converting portion, when the speech input for selection is provided; and PA1 an acoustic template for the unknown speaker being created by converting the acoustic features of the acoustic templates of the speakers that are selected by the selection portion, by the converter, for performing recognition of the content of speech of the speech input of the unknown speaker by using the created acoustic template of the speaker.