The present technology relates to a music search apparatus and method, a program, and a recording medium, and more particularly, to a music search apparatus and method, a program, and a recording medium, which are capable of identifying music from an input signal.
In the past, a process of matching a feature quantity of an input signal with a feature quantity of a reference signal which is a candidate of music to be identified has been performed in order to identify music input as an input signal. However, for example, when an audio source of a television program such as a drama is used as an input signal, a noise component (hereinafter referred to simply as “noise”) including a non-music signal component such as a conversation, sounds (ambient noise), white noise, pink noise, and sound effects are frequently mixed with a music signal component such as background music (BGM), and a change in a feature quantity of an input signal by such noise affects a matching process result.
In this regard, techniques of performing a matching process using only a component having high reliability using a mask pattern for masking a component having low reliability in a feature quantity of an input signal have been proposed.
Specifically, a plurality of kinds of mask patterns for masking a matrix component corresponding to a predetermined time frequency domain are prepared for a feature matrix representing a feature quantity of an input signal transformed into a signal in a time frequency domain in advance. A matching process between a feature quantity of an input signal and a feature quantity of a plurality of reference signals in a database is performed using all mask patterns. Music of a reference signal from which highest similarity is calculated is identified as music of an input signal (for example, see Japanese Patent Application Laid-Open (JP-A) No. 2009-276776).