The invention relates to a speech recognition system, a speech recognition method and a storage medium in which a single application program can be executable based on speeches of plural speakers.
In recent years, there has been a rapid growth in various applications using an auto speech recognition (ASR) system. For example, by applying an auto speech recognition system to a car navigation system, various effects are produced, such as that a car can certainly arrives at a destination while safety in driving is secured.
On the other hand, since such an auto speech recognition system automatically responds to a user speech, the system is likely to cause wrong recognition in a case where speeches of plural users are simultaneously inputted, resulting in difficulty in executing an application program so as to meet to user's intention. In this case, a direction from which a received speech is inputted is determined based on the received speech, and a speaker is specified based on a characteristic quantity of the speech or the like and speech recognition is performed on only the speech delivered by the specified speaker, thereby enabling a speech recognition application program to be executed without wrong recognition of the received speech.
For example, disclosed in Japanese Patent Application Laid-Open No. 2001-005482 is a speech recognition apparatus with a construction in which a speaker is specified by analyzing a speech, optimal recognition parameters are prepared for each specified speaker and the parameters are sequentially optimized according to a speaker, and with such an apparatus, speeches of plural speakers, even if being inputted alternately, are not confused in recognition, thereby enabling an application program to be executed.
Moreover, disclosed in Japanese Patent Application Laid-Open No. 2003-114699 is a car-mounted speech recognition system in which speeches of plural speakers are received by a microphone array, the received speeches are separated into speech data of individual speakers, and thereafter, speech recognition is conducted on the separated speech data. With such a system adopted, for example, in a case where speakers take a driver's seat, a passenger seat and the like, respectively, it is possible that speech data is collected while a directivity characteristic range of the microphone array is changed with ease to recognize a speech of each of the speakers, thereby enabling a significant reduction in occurrence of wrong recognition.