In a process for an audio recognition or a pitch detection or the like, it is necessary to discriminate a sound generating period, that is, a period during which an audio or the sound of a musical instrument is generated from a non-sound generating period, that is, the audio or the sound of the musical instrument is not generated. Since an environmental noise necessarily exists in an ordinary audio space even in the non-sound generating period, if the process such as the audio recognition or the pitch detection is carried out in all periods without discriminating the sound generating period from the non-sound generating period, an erroneously processed result may be possibly obtained due to the environmental noise during the non-sound generating period. Further, it is meaningless and not preferable from the viewpoint of wastefully consuming the resources of a processor to carry out the audio recognition or the pitch detection to the sound of the non-sound generating period for which the process is originally unnecessary.
As a method for discriminating the sound generating period from the non-sound generating period in an audio signal, a method is widely used that a period during which the S/N (Signal-Noise) ratio of an obtained audio signal exceeds the threshold value of a predetermined S/N ratio is specified as the sound generating period. However, the level of the environmental noise in the non-sound generating period changes in various ways in an environment in which the audio signal is obtained. Accordingly, when the sound generating period is specified by an S/N ratio using a fixed noise level, the non-sound generating period is erroneously specified as the sound-generating period in an audio signal obtained under an environment in which the level of the environmental noise is high or the sound generating period is erroneously specified as the non-sound generating period in an audio signal obtained under an environment in which the level of the environmental noise is low.
To solve the above-described problems, for instance, Patent Document 1 discloses a technique that when audio information is extracted from video information with an audio, different noise levels are used depending on the genre of contents indicated by the video information with the audio.
Patent Document 1: JP-A-2003-101939
Further, for instance, Patent Document 2 discloses a technique that an audio signal is divided into frames of prescribed time length and a noise level used for calculating an S/N ratio in the subsequent frame is updated on the basis of the attribute value of the frame specified as a sound generating period in the past.
Patent Document 2: JP-A-2001-265367