Speech recognition devices have a function for recording and recognizing the speech of a target speaker with a microphone and converting the result of recognition into text (characters). However, it is difficult to distinguish background noise from speech depending on the environment. In particular, when the speech of more than one person is recorded, the speech may be difficult to obtain depending on the distance from the microphone or the direction of the microphone. Even when the speech of a single person is recorded, a sound which is not suitable for speech recognition may be included because of reverberation in a room or at a meeting. However, if a sound with a low volume is recorded to surely obtain speech, the differentiation from noise becomes more difficult.
As described above, the prior art has difficulty in appropriately setting the threshold of the volume for speech recognition in accordance with the environment.
Embodiments described herein aim to provide a speech recognition device, a speech recognition method and a storage medium capable of obtaining speech in the desired range for the user based on an interactive adjustment instruction with the user.