As shown in FIG. 6, the detection of the segment including the specific sound signal is a detection of similar segments including a sound similar to the specific sound signal called the reference signal (reference sound signal) among the sound signals called the stored signals (stored sound signals) that are longer than the reference signal.
It is to be noted that, in the present application, the detection of the similar segment is defined as a detection of a starting time of a top of this similar segment.
In a prior art, as a high-speed method of detecting the similar segment to the reference signal from the stored signals, there is a time-series active search method (for example, Japanese patent No. 3065314, “HIGH SPEED SIGNAL RETRIEVAL METHOD, APPARATUS AND MEDIUM FOR THE SAME”)
However, most search methods for the reference signal included in the stored signals, as described above, make an assumption that a similar segment to the reference signal involved in the stored signals is almost the same as the reference signal.
Thus, in a case that another sound such as narration and the like is overlapped on the music for detection from the stored signals (a case of overlapping an additive noise), the sound signal of the segment is greatly different from the reference signal, therefore, it is not possible to perform the search.
Moreover, in the prior art, there are rare examples of a segment detection method including the specific sound signal aimed to detect music used as BGM too. There is only “Self-optimized spectral correlation method for background music identification (Proc. IEEE ICME '02, Lausanne, vol. 1, 333/336 (2002))”.
However, “Self-optimized spectral correlation method for background music identification” has a problem such that it requires a very long time for detection because of the huge amount of calculation required.
A divide and locate method is proposed as a method for detecting the segment including the specific sound signal much faster (for example, Japanese Patent Application First Publication No. 2004-102023, “SPECIFIC SOUND SIGNAL DETECTION METHOD, SIGNAL DETECTION APPARATUS AND SIGNAL DETECTION PROGRAM AND MEDIUM”).
<Outline of the Divide and Locate Method>
FIG. 7 shows the outline of the divide and locate method, and steps of the divide and locate method are explained below.
First, as shown in step (a) of FIG. 7, a power spectral is calculated from waveform signals of the reference signal and the stored signals respectively, and the spectrograms are obtained respectively.
The spectrograms of small areas with a predetermined size are cut out of the spectrogram of the reference signal.
These spectrograms of small areas are generated by cutting a certain number of points of the original spectrogram in a direction of a frequency axis and in a direction of a time axis. These spectrograms of small areas can have overlapping.
The spectrograms of small areas cut in such a manner are called small-region spectrograms.
When a starting time is “ti”, and a frequency band is “ωm”, the small-region spectrogram in the reference signal is expressed as “Fti, ωm”.
If the starting time is “t”, the frequency band is “ωm” and the size is the same as “Fti, ωm”, then the small-region spectrogram in the stored signal is expressed as “Gt, ωm”.
A set of all time points ti in the reference signal spectrogram at which the small-region spectrograms Fti, ωm are divided is expressed as TR (TR={t1, t2, . . . }), and a set of all frequency bands is defined as W (W={ω1, ω2, . . . }).
Power values at the small-region spectrograms are normalized respectively in order to reduce the fluctuation of the sound volume.
Next, as shown in step (b) of FIG. 7, in accordance with each of Fti, ωm in the reference signal, similar time points at the frequency ωm are searched from the stored signal.
This search is operated by applying the time-series active search method (TAS: Japanese patent No. 3065314, “HIGH SPEED SIGNAL RETRIEVAL METHOD, APPARATUS AND MEDIUM FOR THE SAME”).
It should be noted here that the time point which is similar to Fti, ωm is the time point t at which a degree of small-region similarity s′p (Fti, ωm, Gt, ωm) between Fti, ωm and Gt, ωm is larger than a search threshold for a small-region s′pth.
In accordance with the divide and locate method, TAS is applied upon searching the time points at which such similar small-region spectrograms are detected, therefore, a ratio of histogram overlapping between Fti, ωm and Gt, ωm is used as the degree of small-region similarity s′p (Fti, ωm, Gt, ωm).
The degree of small-region similarity in accordance with the ratio of histogram overlapping is called a small-region histogram similarity.
Here, the time-series active search method is explained briefly. The time-series active search method (TAS) is outlined in FIG. 8.
In accordance with the time-series active search method, a segment with the spectrogram having the ratio of histogram overlapping with respect to the spectrogram of the reference signal is larger than a threshold θ
First, the ratio of histogram overlapping between a spectrogram X and a spectrogram Y is explained.
Here, X and Y are the spectrograms with the same size in the direction of a frequency axis and in the direction of a time axis
In the beginning, after normalizing spectral feature at each time point on the spectrograms, code (vector quantization code: a code generated by coding in accordance with vector quantization) strings are generated corresponding to the spectrograms respectively.
Next, in a calculation of the ratio of histogram overlapping, with respect to each histogram, a histogram (histogram feature) is generated by counting up a number of indications of the above-described vector quantization code.
Here, the histogram features of X and Y are expressed as hX and hY, and the ratio of histogram overlapping Sh(hX, hY) between X and Y is calculated in accordance with a formula (1) shown below.
                              Sh          ⁢                                          ⁢                      (                                          h                X                            ,                              h                Y                                      )                          =                              1            D                    ⁢                                    ∑                              γ                =                1                            L                        ⁢                          min              ⁢                                                          ⁢                              (                                                      h                    ⁢                                                                                  ⁢                                          γ                      X                                                        ,                                      h                    ⁢                                                                                  ⁢                                          γ                      Y                                                                      )                                                                        (        1        )            
Here, it should be noted that hγX and hγY are frequencies (number of indications of vector quantization codes) of hX and hY in γ-th bins. L is a number of bins in the histogram. D is a total number of frequencies in the histogram.
In the time-series active search method, the above described ratio of histogram overlapping is applied to the similarity of the spectrogram.
The ratio of histogram overlapping between the spectrogram of the reference signal and the spectrogram in the segment t of the stored signal is defined as S″ (t). After comparing at the time t, a skip width z to a next comparison position is calculated in accordance with a formula (2) using S″ (t), a comparison is operated after shifting the comparing position by z, and a new skip width is calculated.
                    z        =                  {                                                                      floor                  ⁢                                                                          ⁢                                      (                                                                  D                        ⁡                                                  (                                                      θ                            -                                                                                          S                                ″                                                            ⁡                                                              (                                t                                )                                                                                                              )                                                                    +                      1                                                                                                  ⋯                                                                                  if                    ⁢                                                                                  ⁢                                                                  S                        ″                                            ⁡                                              (                        t                        )                                                                              <                  θ                                                                                    1                                            ⋯                                            otherwise                                                                        (        2        )            
In the formula (2), floor(x) is an integer which is a maximum and not larger than x.
In the time-series active search method, by repeating the above described operation, the search process is operated.
If the ratio of histogram overlapping of the compared segment is larger than a threshold θ, then the segment is detected to be similar to the reference signal.
In the time-series active search method, in accordance with such an operation, along with reducing a total comparison count, by skipping, it is possible to detect all segments with the ratio of histogram overlapping larger than a threshold θ without missing any.
Next, returning to FIG. 7, as shown in step (c) of FIG. 7, based on the search result of all small-region spectrograms Fti, ωm, with respect to each time point t in the stored signal, the degrees of small-region similarity are integrated and a similarity (a degree of segment similarity) S′ (t) to the reference signal at t is calculated by applying a formula (3) below.
                                                        S              ′                        ⁡                          (              t              )                                =                                    1                                              TR                                                      ⁢                                          ∑                                  ti                  ∈                  TR                                            ⁢                                                (                  max                  ⁢                                                                                                                              ω                    ⁢                                                                                  ⁢                    m                                    ∈                  W                                            ⁢                              (                                                      s                                          ′                      ⁢                                                                                          ⁢                      P                                                        ⁡                                      (                                          Fti                      ,                                              ω                        ⁢                                                                                                  ⁢                        m                                            ,                                              Gt                        +                        ti                                            ,                                              ω                        ⁢                                                                                                  ⁢                        m                                                              )                                                  )                                                    )                            (        3        )            
In this formula (3), |TR| is a number of elements in TR. If Gt+ti, ωm is not detected as the small-region spectrogram similar to Fti, ωm at time t in the stored signals as a result of searching Fti, ωm, in other words, this is the case in a formula (4) shown below, then the degree of similarity (degree of small-region similarity) between the small-region spectrograms is as shown in a formula (5).S′P(Fti,ωm,Gt+ti,ωm)≦S′Pth  (4)S′P(Fti,ωm,Gt+ti,ωm)=0  (5)
Accordingly, in a practical search, only when Gt+ti, ωm is detected as the small-region spectrogram similar to Fti, ωm, S′p (Fti,ωm, Gt+ti, ωm) is summed up or integrated at the formula (3).
In the formula (3), as in a formula (6) shown below, with respect to S′p (Fti, ωm, Gt+ti, ωm), the frequency band ωm is selected from a set of all the frequency bands such that its value is the maximum.
                              max                                    ω              ⁢                                                          ⁢              m                        ∈            W                          ⁢                  (                                    s                              ′                ⁢                                                                  ⁢                P                                      ⁡                          (                              Fti                ,                                  ω                  ⁢                                                                          ⁢                  m                                ,                                  Gt                  +                  ti                                ,                                  ω                  ⁢                                                                          ⁢                  m                                            )                                )                                    (        6        )            
The reason the above described operation is executed is that with respect to the small-region spectrograms of the multiple and different frequency bands at the same time point in the reference signal, if the small-region spectrograms of the multiple and different frequency bands at the same time point in the stored signals are detected as similar small-region spectrograms, the frequency band with the maximum degree of similarity in the small-region histogram is selected, in other words, the frequency band considered to have overlapping sounds which are closest to the silence and overlapping on the reference signal small is selected.
Based on the degree of the segment similarity obtained in accordance with the above manner, the reference signal is detected in the region having the starting time t at which the degree of the segment similarity S′ (t) is larger than the threshold S′th.
However, upon using the divide and locate method described above, when similar small-region spectrograms are searched at a frequency band ωm, the ratio of the histogram overlapping between Fti, ωm and Gt+ti, ωm is calculated, therefore, it takes time to calculate the ratio of the histogram overlapping, and moreover, for the histograms of combinations of Fti, ωm and Gt+ti, ωm which are not similar, their histogram overlapping may be calculated too, therefore, it takes a long time to detect the segment including the specific sound signal.
In the present invention, with respect to searching similar small-region spectrograms that takes a long time in the above described prior art, it is possible to check fast whether or not two small-region spectrograms in the reference signal and the stored signals are similar. The present invention has an object of providing a detection system of the segment including the specific sound signal that detects the segment including the specific sound signal faster than the prior arts by skipping checking the similarity of combinations between the small-region spectrograms having no possibility of being similar.