In recent years, various interfaces for controlling various devices such as a TV, audio, a robot, or the like, have been developed and a demand of an interface capable of allowing a user to control devices without using any apparatus has increased.
In order to satisfy the demand of the user interface, researches into an image based gesture recognition technology or an interface using voices or sounds have been conducted. In particular, researches for controlling various devices by recognizing sounds generated by persons have progressed. However, the existing researches may have less recognition rate and low performance due to various noisy environments.