Owing to advances in speech recognition technology and improvements in device capabilities, speech recognition is beginning to be introduced even in commonplace devices. Since control of device operation by voice does not require complicated manipulation, such control is helpful where children, the elderly or disabled are concerned. In addition, since an input device such as a keyboard is unnecessary, this contributes to a reduction in device size.
In general, speech recognition involves the occurrence of some recognition error. Noise that is input to a speech recognition apparatus simultaneously with the utterance of the user has a particularly serious effect upon speech recognition and it is therefore necessary to take ambient noise into account. The following noise countermeasures are often employed: (1) spectrum subtraction, which is a method wherein a spectrum obtained by subtracting a noise spectrum from an input speech spectrum is used in speech recognition, and (2) parallel model combination, which is a method wherein estimated noise is incorporated in an acoustic model beforehand and a decline in recognition rate is prevented in a noisy environment.
Further, there is a technique through which a decline in recognition rate is prevented by an approach different from that of speech recognition processing. For example, the specification of Japanese Patent Application Laid-Open No. 11-126092 discloses suppression of ambient noise as by closing windows and turning off music when speech recognition is carried out. Another approach is to notify the user of the present magnitude of ambient noise (the difficulty of speech recognition), thereby preventing needless utterances by the user.
Our surroundings include many devices that emit noise of their own. In order to operate such devices comfortably by voice, noise adaptation based upon the spectrum subtraction method or parallel model combination method is considered to be effective. However, there are instances where noise from a device changes greatly depending upon the operating mode of the device. For example, in the case of a facsimile machine, the noise produced at the time of data reception and the noise produced at the time of data transmission differ greatly from each other. If an adaptation is made taking into consideration the noise produced in a specific operating mode of a device in a case where the noise environment changes in this manner, a decline in recognition rate is expected during operation of the device in a mode for which no adaptation is made. Of course, though it is possible to make an adaptation using all of the noise produced in each of the operating modes, the results of the adaptation tend to be less than satisfactory.