With regards to an AV content recorded by a user using a video recording device such as a digital camera, there is demand for a function for removing unnecessary sections from the AV content and extracting only sections which are of interest to the user (referred to below as interesting sections).
In one conventional art, a device is proposed which detects frequencies in a frequency band corresponding to human voices from an audio signal of an AV content, and extracts a section in which voices are continuous for at least a predetermined amount of time as an interesting section (refer to Patent Literature 1). The above device is able to extract the interesting section using a simple method of analyzing frequency of the audio signal and monitoring a continuation time.
In another conventional art, a method for extracting an interesting section is proposed in which probability models are used to determine for each unit section of an audio signal whether “applause”, “cheering” or the like is included in the unit sections, and a section comprising of at least a predetermined number of consecutive unit sections which are determined to be of the same type are extracted as the interesting section (refer to Patent Literature 2).
In the methods disclosed in Patent Literature 1 and Patent Literature 2, the interesting section is detected by assessing continuity of audio features (features such as frequency of the audio signal).