1. Field of the Invention
The present invention relates to a speaker distance detection apparatus and method capable of detecting at which distance a speaker is uttering a speech by using a microphone array, and a speech input/output apparatus using the speaker distance detection apparatus.
2. Description of the Related Art
With the recent rapid advancement of a computer technique, mobile telephones, portable terminals, and the like as communication equipment are being enhanced in function or decreased in size. In particular, even various kinds of applications using a speech, which used to be difficult to be put into practical use in terms of a computer processing load, are shifting from a commercialization stage to a stage requesting convenience.
Recently, a speech input/output apparatus is also being put into practical use, which is capable of detecting an utterance direction of a speaker who utters a speech by using a plurality of microphones to enhance directivity, thereby making it difficult to pick up environmental noise.
However, particularly, in a mobile telephone, and the like, the recognition precision with respect to a speech input is often influenced by environmental noise. Therefore, in order to enhance the recognition precision, there is no effective way other than inputting a speech with a handset placed close to a face.
Recently, a technique has been developed, in which the distance between a speaker and a mobile telephone is detected by providing an infrared sensor, etc. at the mobile telephone, and estimating the level of recognition precision in accordance with the distance, thereby changing recognition engines, and changing methods for outputting recognition results. Such a technique is disclosed in JP 6(1994)-124097 A, JP 9(1997)-162772 A, JP 2002-111801 A, and the like.
However, the above-mentioned method has the following problems. First, even in the case of detecting the distance between a speaker and a mobile telephone, and using a recognition engine in accordance with the detected distance, in actual, a recognition mode is often switched manually. Therefore, when a speaker utters a speech while frequently placing a handset close to or away from a face, the switching operation itself is cumbersome.
Furthermore, even in the case where a recognition engine is switched automatically, when a speaker utters a speech while frequently placing a handset close to or away from a face, a time difference is necessarily caused between the actual state and the switching of the recognition engine. Consequently, the use mode of the mobile telephone is not matched with the recognition mode, and a speech input/output level becomes inappropriate, which makes it impossible to ensure desired recognition precision.
Furthermore, in order to detect the distance between the speaker and the mobile telephone, it is required to provide other sensor configurations such as an infrared sensor. However, there is a physical constraint on a mobile telephone that is strongly requested to be miniaturized, so that it is actually difficult to provide such a sensor configuration.