1. Field of the Invention
This invention relates to a speech recognition apparatus for recognizing information contained in a speech signal.
2. Description of the Related Art
Various algorithms for speech recognition have been provided, and the speech recognition rate or performance for all the types of speech units to be recognized cannot be improved without increasing the size and cost of the apparatus. Therefore, in the prior art, permissible limitations such as those on the number of words or speakers to be recognized are previously determined according to the application of the speech recognition apparatus, and an optimum recognition system is selected under the predetermined conditions.
FIG. 1 is a block diagram showing the construction of a conventional speech recognition apparatus. A band-pass filter (BPF) is arranged in an acoustic analyzing section 11, and a speech input to the acoustic analyzing section 11 is analyzed with short-time spectra for each preset period of time by means of the BPF so as to derive characteristic parameters. An output of the acoustic analyzing section 11 is processed by one of recognition units 15 and 16 which are selectively operated by a switching control unit 12 and switching circuits 13 and 14, and an optimum one of the recognition systems is selected to effect the recognition for the input speech.
The recognition unit 15 is a recognition unit for a specified speaker in which it is necessary to register the reference speech patterns every time the speaker is changed. The recognition for a speech input relating to the specified speaker and proper nouns is effected by the recognition unit 15. A changeable reference pattern memory 17 in which reference patterns can be alterably registered may be formed by a random access memory (RAM), for example, and the memory data thereof is read out by the recognition unit 15, after which and a similarity calculation process for obtaining the similarity between the memory data and the input speech analyzed by the acoustic analyzing unit 11 is performed. A DP matching (DTW: dynamic time warping) method is performed by the recognition unit 15. A matching degree (distance) is calculated and a category of reference patterns having the smallest distance is output as the recognition result.
The recognition unit 16 is a recognition unit used for unspecified speakers and storing general words or vocabularies, which are generally used by many people, in the form of reference patterns that cannot be alterably registered. The recognition for specified general words, such as numerals, is effected by the recognition unit 16. An unchangeable reference pattern memory 18 in which reference patterns cannot be alterably registered may be formed by a read only memory (ROM), for example. The memory data thereof is read out by the recognition unit 16 and a similarity calculation process for obtaining a similarity between the memory data and the input speech analyzed by the acoustic analyzing unit 11 is performed by recognition unit 16. A discriminant function method is performed by the recognition unit 16, and a category of reference patterns having the largest calculated similarity is output as the recognition result.
The switching method for switching a plurality of recognition units according to the application of the apparatus as described above is disclosed in Japanese Patent Disclosure (KOKAI) No. 49-3507. In the switching method disclosed therein, a logical determination is made according to the recognition results obtained in the recognition unit 15 or 16 and an optimum recognition result is selected and output based on the logical determination by means of the switching control unit 12.
However, with the above construction, two types of recognition units 15 and 16 must be provided to meet the respective applications. Furthermore, the switching circuits 13 and 14 and the switching control unit 12 for switching them are also required, making the construction of the apparatus complex and increasing the size and cost thereof.
In this way, in the prior art, it is necessary to provide recognition units having different algorithms corresponding to different types of limitations on the speakers, such as specified speakers or unspecified speakers. Conventional apparatus also place limitations on the ways of utterance in order to enhance the speech recognition performance of a speech recognition apparatus. As a result, the apparatus is of complex construction and increased size and cost.