Segmentation of audio signals into meaningful regions is an essential aspect of many applications. For instance, segmentation plays an important role in speech/music discrimination for broadcast transcription, audio coding using transient detection and window switching, identification of suitable audio thumbnails, and demarcation of songs in continuous streams for database creation or smart transport. To perform effectively, such applications rely on a basic signal understanding provided by automatic segmentation. Segmentation approaches described in the literature generally represent the signal as a sequence of features in a meaningful feature space and then attempt to identify points of change in the feature sequence using statistical models or various distance metrics.
The distance metric approaches typically estimate segment boundaries by finding peaks in a novelty function. These are interpreted as points of change in the audio signal. However, the typical novelty functions tend to exhibit peaking within the actual segments as well as at the segment boundaries. Thus, these segmentation approaches based on novelty functions tend to lead to incorrect segment boundary determinations. It is therefore desirable to provide an improved signal segmentation method that is less prone to incorrect identification of segment boundaries.