1. Field of Invention
This invention relates to automatic extraction of audio excerpts or abstracts.
2. Description of Related Art
Practically all known techniques of audio segmentation are based on pause detection. Such techniques are not robust under noisy or reverberant conditions and are even less robust for music or non-speech audio. Speech recognition and/or speaker identification based techniques require trained statistical models. Such techniques are not robust unless the audio data resembles the training domain. Further, the computational resources required for such techniques are significant, and often impractical.
In video summary generation, some conventional video summary generators have attempted to use scene transition graphs to perform ad-hoc segmentation. The ad-hoc segmentation is then followed by a hierarchical clustering process to create a summary. Other conventional video summary generators have used closed captions, acoustic silence and pitch to determine the segment boundaries and to identify candidate segments for use in generating summaries. These conventional systems depend on determining the presence of closed captions, acoustic silence and pitch within the work and do not work well when these features are difficult to detect or missing from the work. Moreover, these conventional systems merely select a representative segment and therefore are not able to generate summaries of overly long candidate segments.