(1) Field of the Invention
The present invention relates to a sound identification apparatus which identifies an inputted sound, and outputs the type of the inputted sound and an interval of each type of inputted sound.
(2) Description of the Related Art
Conventionally, sound identification apparatuses have been widely used as means for extracting information regarding the source, emitting device, and so on of a certain sound by extracting acoustic characteristics of the sound. Such apparatuses are used, for example, for detecting the sound of ambulances, sirens, and so on occurring outside of a vehicle and providing a notification of such sounds to within the vehicle, for discovering defective devices by analyzing the sound a product manufactured in a factory emits during operation and detecting abnormalities in the sound, and so on. However, recent years have seen a demand for a technique for identifying the type, category, and so on of sounds from mixed ambient sounds in which various sounds are mixed together or sounds are emitted alternately, without limiting the sound to be identified to a specific sound.
Patent Reference 1 (Japanese Laid-Open Patent Application No. 2004-271736; paragraphs 0025 to 0035) can be given as an example of a technique for identifying the type, category, and so on of an emitted sound. The information detection device described in Patent Reference 1 divided inputted sound data into blocks based on predetermined units of time and classifies each block as sound “S” or music “M”. FIG. 1 is a diagram that schematically shows the result of classifying sound data on the time axis. Next, the information detection device averages, per time t, the results of classification in a predetermined unit of time Len, and calculates an identification frequency Ps(t) or Pm(t), which indicate the probability that a sound type is “S” or “M”. The predetermined unit of time Len in time t0 is schematically shown in FIG. 1. For example, in the case of calculating Ps(t0), the sum of the number of sound types “S” present in the predetermined unit of time Len is divided by the predetermined unit of time Len, resulting in the identification frequency Ps(t0). Then, Ps(t) or Pm(t) is compared with a predetermined threshold P0, and an interval of the sound “S” or the music “M” is detected based on whether or not Ps(t) or Pm(t) exceeds the threshold P0.
However, with Patent Reference 1, in the case of calculating the identification frequency of Ps(t) and the like in each time t, the same predetermined unit of time Len, or in other words, a predetermined unit of time Len which has a fixed value, is used, which gives rise to the following problems.
The first problem is that interval detection becomes inaccurate in the case where sudden sounds occur in rapid succession. When sudden sounds occur in rapid succession, the judgment of the sound type of the blocks becomes inaccurate, and differences between the actual sound type and the sound type judged for each block occur at a high rate. When such differences occur at a high rate, the identification frequency Ps and the like in the predetermined unit of time Len become inaccurate, which in turn causes the detection of the final sound or sound interval to become inaccurate as well.
The second problem is that the recognition rate of the sound to be identified (the target sound) is dependent on the length of the predetermined unit of time Len due to the relationship between the target sound and background sounds. In other words, in the case where the target sound is identified using the predetermined unit of time Len, which is a fixed value, there is a problem in that the recognition rate for the target sound drops due to background sounds. This problem shall be discussed in detail later.
Having been conceived in light of the aforementioned problems, an object of the present invention is to provide a sound identification apparatus which reduces the chance of a drop in the identification rate, even when sudden sounds occur, and furthermore, even when a combination of the target sound and background sounds changes.