The present disclosure relates to an audio signal processing device, an audio signal processing method, and an audio signal processing program, which determine the state of a voice signal.
In recent years, users have been restrained from operating a telephone to make a call while driving an automobile, so a hands-free call system using a short-range radio communication or the like has been introduced.
In the hands-free call system, a so-called echo voice signal hinders the call, in which a voice emitted by a near end during the call propagates from a speaker of a far end, is inputted to a microphone, and is heard also by the near end via a telephone line, a network, or the like.
In order to suppress such an echo voice signal as described above, varieties of echo cancellation and echo suppression technologies have been proposed. However, when the voice signal is suppressed in a state where both of the far end and the near end utter calls, that is, in a state of a so-called double talk, not only an unnecessary echo voice signal on the far end but also a necessary voice signal on the near end are suppressed. Hence, it is necessary to determine whether or not double talk is the current state.
Here, with regard to the voice signal on the far end, it just needs to be determined whether or not a voice signal on the other end of the call is present, and accordingly, a voice determination technology known in public can be used. Meanwhile, with regard to the voice signal on the near end, it is necessary to determine not only whether or not the voice signal is present, but also whether or not the voice concerned is the voice signal of the near end or the echo voice signal. Hence, in the voice determination technology known in public, it has been difficult to determine whether or not the voice signal of the near end is included in the voice signal on the near end.
Japanese Unexamined Patent Application Publication No. 2007-53512 describes technology for determining the state of the voice on the near end based on the volume ratio of the voice output signal and the voice input signal.