There is developed an application for recognizing, from an audio signal, a phrase uttered by a speaker and translating the recognized phrase into another language or searching on a network or a database with the recognized phrase as a query. In such an application, in order to identify a section in which the speaker utters, the speaker oneself is requested to operate an apparatus into which such an application is implemented, thereby instructing to start recording of an audio signal and to terminate the recording thereof, for example. However, depending on an environment in which such an application is utilized, it is difficult for the speaker to perform such an operation in some cases. In a case where the speaker performs some kind of two-handed work, it is difficult for the speaker to perform an operation for instructing to start recording of an audio signal and to terminate the recording thereof, for example.
On the other hand, a technology for determining whether being silent or voiced in an audio signal is proposed. In this technology, power, a pitch parameter, and so forth of an input audio signal are calculated for each given section of the sound signal. In addition, in a case where the pitch parameter corresponding to a subsequent second given section following a voiced first given section is lower than a predetermined threshold value, the second given section is determined as a silent given section.
Examples of the related art include Japanese Laid-open Patent Publication No. 11-133997.