The invention relates to a circuit arrangement for speech recognition comprising an evaluation circuit for determining spectral feature vectors of time frames of a digital speech signal by means of a spectral analysis, for logarithmizing the spectral feature vectors, and for comparing the logarithimized spectral feature vectors with reference feature vectors.
Speaker-dependent speech recognition devices are used successfully in many fields, for example in systems which recognize spoken text, understand it and convert it into an action (acoustically given commands for the control of appliances), while the speech signal to be recognized is often also transmitted over a telephone line (remote control by telephone).
The book "Automatische Spracheingabe und Sprachausgabe" by K. Sickert, Haar bei Munchen, Verlag Markt und Technik, 1983, pp. 223-230 and 322-326 deals with the construction principle of a speech recognition device in which the speech signal is first analysed in order to extract the information-carrying features. These features are represented by so-called feature vectors which are compared with reference feature vectors in a recognition unit, the reference vectors being determined during a training phase and stored in a reference memory.
The publication "Verfahren fur Freisprechen, Spracherkennung und Sprachcodierung in der SPS51" by W. Armbruster, S. Dobler and P. Mayer, PKI Technische Mitteilungen 1/1990, pp. 35-41 discloses a technical realisation of a speaker-dependent speech recognition device. During an analysis of a digital speech signal in this speech recognition device, the progression in time of this signal in the spectral range is observed and spectral feature vectors are determined which are suitable for the description of the characteristic features of the speech signal. During a learning or training phase, referred to as training hereinafter, each word to be recognized is recorded several times. Each time, spectral feature vectors are determined, from which reference feature vectors specific to a word are generated through averaging. After the training has been concluded, reference feature vectors are available for each word taught, stored in a reference sample memory. During normal operation, the test phase, the spectral feature vectors are determined for a speech signal to be recognized and supplied to a recognition unit in which a comparison with the stored reference feature vectors takes place by means of a method based on dynamic programming.
Problems in achieving a reliable recognition result occur most of all through the superimposition of interference quantities on the speech signal, for example, distortions of the frequency characteristic or quasi-stationary noise signals. Such interference quantities are mainly introduced during the transmission of the signal through a telephone line and/or through background noise during the recording. The recognition results are in addition impaired when the determination of reference feature vectors during training takes place under other recording conditions than does the determination of feature vectors during the test phase. In this case, the recognition unit cannot reliably perform the comparison between feature vectors and reference vectors any more, which results in an increase in the error percentage in the recognition.
In addition, the possibilities of using speech recognition devices are most of all restricted by the fact that the majority of the technical realisations achieved up to now are only suitable for speaker-dependent speech recognition, which implies training by the relevant user. A use of such speaker-dependent speech recognition devices in systems where the spoken texts of frequently changing users are to be recognized and/or answered (for example, fully automated information systems with spoken dialogue) is not very well possible.