The present technology relates to a sound processing device, a sound processing method, and a program. More particularly, the present technology relates to a sound processing device, sound processing method, and program, capable of identifying any content with higher accuracy.
As an example, a sound signal constituting content is set as a reference signal, and an input signal is obtained by picking up sound reproduced based on the reference signal in any device. When match retrieval is performed based on these input signal and reference signal, content can be identified. In this case, sound outputted from an original sound source is picked up in a state where reverberation or noise is mixed therein, and thus sound based on the input signal becomes the sound where a reverberation sound or noise is superimposed on sound of content.
As an example of such a content identification technique, there has been a musical piece identification technique in which a signal of a noiseless music recorded in a CD (Compact Disc) or the like is set as a reference signal and its background music is identified from an input signal with which non-musical sound is mixed.
In the musical piece identification technique, identification of a musical piece is performed by a process of matching between an acoustic feature quantity extracted from the reference signal of a noiseless music and an acoustic feature quantity extracted from the input signal. In the following description, it is assumed that the input signal is mixed with a noise, and thus an acoustic feature quantity obtained from the input signal would be affected by the noise.
Thus, for example, a mask pattern is used in the matching process. The mask pattern is information representing a reliable element from among elements constituting an acoustic feature quantity. In the matching process using the mask pattern, matching is performed by dividing each element constituting a multi-dimensional acoustic feature quantity into a reliable element and an unreliable element and by using only a reliable element based on the mask pattern.
As a musical piece identification technique using a mask pattern in this way, there is proposed, for example, an approach of performing a musical piece identification in which a plurality of mask patterns are prepared in advance to mask a given time frequency domain with respect to a feature matrix having a time frequency component (for example, refer to Japanese Unexamined Patent Application Publication No. 2009-276776).
In the above-described approach, the musical piece identification is performed by setting a maximum value among the similarities calculated by using all mask patterns previously prepared with respect to a feature matrix of an input signal and a feature matrix of a musical piece in a database, that is, the feature matrix of a reference signal as the similarity between an input signal and a musical piece. In this musical piece identification, a plurality of fixed mask patterns which are in dependent on the input signal are stored and the matching process is performed using these mask patterns.