1. Field
The following description relates to an apparatus for detecting endpoints of sound sources and a method thereof, and more particularly, to an apparatus for detecting endpoints of sound sources according to each direction from a plurality of sound sources and a method thereof.
2. Description of the Related Art
In general, in various fields related to sound-source technologies, such as speech recognition, speaker recognition, and video calling, a disturbing sound or interfering sound in addition to a sound source of a speaker exists in a surrounding environment in which the sound source is input.
In such an environment having various sound sources, in order to detect the sound source of the speaker, sound source endpoint detection is used to search for a region in which a sound source exists. For example, in order to control a television with a speech command, a start point and an endpoint of a signal having a command indicating “Turn on the TV” or “Turn off the TV” is recognized to transmit sound source data corresponding to the command to a sound source recognition apparatus. Such a function is referred to as sound source endpoint detection.
The sound source endpoint detection is designed to detect a point at which a sound source starts and a point at which the sound source ends from a signal including the sound source, and is configured to distinguish a sound source section from a noise section in a signal being input from a microphone such that only the sound section is processed and unnecessary information at the noise section is removed, thereby reducing unnecessary computation and enabling an efficient use of a memory while improving the memory.
The sound source endpoint detection currently equipped on most sound source recognition apparatuses uses a single microphone, and energy-related information input into the microphone is used as a main factor to distinguish the sound source section from the noise section. Because the energy or entropy of a speech signal is increased when speech starts to be vocalized, a point having an energy or entropy at a threshold value or above is determined as a start point of the sound source signal, and the opposite point is determined as an endpoint of the sound source signal.
Alternatively, the sound source endpoint detection may be performed by use of the energy at a frequency band at which a sound source exists or other sound characteristics, in consideration of the change in the voice frequency band.
However, such conventional technology of sound source endpoint detection is configured to extract sound characteristic from a sound source signal input through a single microphone to detect the boundary of the sound. Accordingly, sound having frequency characteristics that are distinguished from those of a sound source, for example, stationary noise, may be detected to some extent, but music input in a predetermined direction or sound including speech from another speaker is not easily removed only through a signal processing method. In particular, as for the sound including speech from a plurality of speakers, the endpoint detection is not achievable only with the frequency characteristics.