There exists a technology for detecting a predetermined acoustic event from an acoustic signal and separating an acoustic signal into signals from different sound sources. An acoustic event is detected as a pattern of an acoustic signal corresponding to a physical event. The acoustic event is associated with a physical state that the physical event induces and an acoustic signal pattern in a period corresponding to the physical state. For example, in case that an acoustic event of “glass crushing” is defined as a detection target, the acoustic event is associated with an acoustic signal pattern that is generated when glass is broken and a physical state “glass being broken”.
For example, NPL 1 describes a method of calculating activation levels of a basis matrix of an acoustic event from a spectrogram of an acoustic signal, using NMF (Nonnegative Matrix Factorization) and detecting an acoustic event included in the acoustic signal, using the activation levels as a feature. More specifically, the method described in NPL 1, by performing, on a spectrogram of an acoustic signal, NMF that uses, as a teacher basis, a basis matrix calculated from learning data in advance, calculates activation levels of respective spectral bases included in the basis matrix. The method, by identifying whether or not a specific acoustic event is included in the acoustic signal, based on a combination of the calculated activation levels, detects an acoustic event.
NMF is also often used for sound source separation for an acoustic signal including sounds from a plurality of sound sources. For example, using NMF, a spectrum of an acoustic signal specified as a separation target is factorized into a basis matrix representing spectral bases of respective sound sources and an activation matrix representing activation levels of the spectral bases. The factorization results in a spectrum for each sound source. The method described in NPL 1 assumes that an acoustic signal generated by a predetermined sound source may also be specified as a detection target acoustic event. That is, the method assumes that, in acoustic events, an acoustic signal pattern corresponding to a physical event that is a generation of a sound from a predetermined sound source is also included.