A technology for separating sound sources by performing non-negative matrix factorization (referred to as NMF below) has been known. The NMF is a matrix resolution method, such as the following Formula (1), for approximating a matrix V with f rows and t columns which is a spectrogram to a product of a spectral basis matrix W with f rows and k columns and an activation matrix H with k rows and t columns. The reference k is a parameter indicating the number of bases.V≈WH  Formula (1)
NPL 1 discloses a technology for detecting an acoustic event included in an acoustic signal using the NMF.
The acoustic event is a pattern of the acoustic signal corresponding to a physical event. According to a transition of a physical state caused by the physical event, an acoustic signal pattern in the corresponding section changes.
The acoustic element is an acoustic signal pattern corresponding to a predetermined physical state. The acoustic signal pattern has an amplitude same as that available in the predetermined physical state. That is, the acoustic element is an acoustic signal pattern having an amplitude.
The acoustic element corresponds to an acoustic signal for one frame or a fragment of the acoustic signal for a plurality of frames on a spectrogram. That is, the acoustic signal pattern is resolved by the acoustic element corresponding to each physical state so that the acoustic event can be easily detected.
Hereinafter, an outline of a method of detecting the acoustic event disclosed in NPL 1 will be described.
In the detection method described in NPL 1, firstly, a short-time Fourier transform is performed relative to the acoustic signal to convert the acoustic signal into the spectrogram. Next, in the detection method described in NPL 1, by performing the NMF to the converted spectrogram, an expression degree of a spectral basis dictionary included in the spectrogram is computed.
In the detection method described in NPL 1, whether the acoustic event is included in the acoustic signal is identified using a combination of the computed expression degrees. With the detection method described in NPL 1, in the above procedure, the acoustic event included in the acoustic signal is detected.
Hereinafter, the method of detecting the acoustic event described in NPL 1 will be more specifically described. In the detection method described in NPL 1, first, the NMF is performed to a spectrogram in which acoustic signals (known acoustic signals) including sounds to be detected are connected to each other to generate the spectral basis dictionary.
Next, in the detection method described in NPL 1, an unknown acoustic signal is converted into a spectrogram by performing the short-time Fourier transformation, and an activation (expression degrees) of each basis forming the generated spectral basis dictionary in the converted spectrogram is computed.
Next, in the detection method described in NPL 1, whether the acoustic event is included is identified using a combination of the computed activations to detect the acoustic event included in the unknown acoustic signal. In the detection method described in NPL 1, the acoustic event is detected on the basis of an assumption such that the activations of the respective bases regarding the same acoustic event show a similar tendency.
The activation is computed when the spectrogram is resolved by performing the NMF using the spectral basis dictionary. In addition, for example, a hidden Markov model (HMM) is used for identifying presence or absence of the acoustic event using the combination of the activations.