1. Technical Field
The present invention relates to a technology for processing a sound signal indicative of various types of audio, such as voice and musical sound, and particularly to a technology for identifying an interval in which a predetermined voice in a sound signal is actually pronounced (hereinafter referred to as “utterance interval”).
2. Background Art
Voice analysis, such as voice recognition and voice authentication (speaker authentication), uses a technology for segmenting a sound signal into an utterance interval and a non-utterance interval (period containing only noise related to the surroundings). For example, a period in which the S/N ratio of the sound signal is greater than a predetermined threshold value is identified as the utterance interval. Patent Document JP-A-2001-265367 discloses a technology for comparing the S/N ratio in each period obtained by segmenting a sound signal with the S/N ratio in a period that has been judged to be a non-utterance interval in the past so as to determine whether the period is an utterance interval or a non-utterance interval.
However, since the technology disclosed in Patent Document JP-A-2001-265367 only compares the S/N ratio in each period of the sound signal with the S/N ratio in a past non-utterance interval to determine whether the period is an utterance interval or a non-utterance interval, a period containing instantaneous noise, such as cough sound, lip noise, and sound produced in the mouth, made by the speaker (a period that should be normally judged as a non-utterance interval) is likely misidentified as an utterance interval.