Field of the Invention
The present invention relates to a sound recognition apparatus, a sound recognition method, and a sound recognition program.
Description of Related Art
Sounds are classified into speech uttered by a person and other sounds. Such other sounds are called as usual sounds that do not have language information. Examples of such usual sounds include an operation sound generated by the operation of an instrument, environmental sounds such as noise generated by the contact of objects with each other, and musical sounds not accompanied by words. Such usual sounds are used as a key for the understanding of an object, an event, an operation state, and the like as a sound source. When speech uttered by a person is used as a sound source to identify an object, an event, and a state while ignoring language information expressed by the speech, the speech may be treated as a “usual sound”.
For example, a monitoring system described in Japanese Unexamined Patent Application, First Publication No. 2008-241991 acquires a signal indicating an observed sound in which sounds emitted from multiple sound sources are mixed by observing surrounding sounds with multiple microphones. The monitoring system generates separated signals by sound sources, causes the separated signals to pass through a noise reducing circuit, and determines whether sounds expressed by the separated signals by sound sources are target environmental sounds by the use of a sound recognition circuit.
An image processing apparatus described in Japanese Unexamined Patent Application, First Publication No. 2011-250100 performs a blind sound source separating process on speech data, extracts sound data by sound sources, and generates direction data indicating directions of the sound sources. The image processing apparatus determines whether a sound of each sound source is an environmental sound other than speech uttered by a person, converts the environmental sound into a text, generates an effect image visually presenting the environmental sound based on the text-converted environmental sound, and overlays the effect image with a content image. The image processing apparatus includes an environmental sound identifying unit that converts an environmental sound into a text.