The technique of segmenting speech/music signals from audio signals has become more important in multimedia applications. There are three kinds of audio signal segmentation algorithms at present. The first kind of audio signal segmentation algorithm designs classifiers by directly extracting the features of the signals in the time domain or the frequency domain to discriminate and to further segment the speech and the music signals. The features used in these kinds of audio signal segmentation algorithms are zero-crossing information, energy, pitch, Cepstral Coefficients, line spectral frequencies, 4 Hz modulation energy and some perception features, such as tone and rhythm. These kinds of conventional techniques extract the features directly. However, the size of the windows used to analyze the signals is increasingly bigger, so the segmented scope is not precise enough. Furthermore, fixed thresholds are used in most methods to determine the segmentation. Therefore, they cannot offer satisfactory results under low SNR noise environments.
The second kind of audio signal segmentation algorithm generates features needed in the classifiers by statistics, which is called the posterior probability based feature. Although better results can be obtained by getting features with statistics, a large number of training data samples are needed in these kinds of conventional techniques and they are also not suitable in actual environments.
The third kind of audio signal segmentation algorithm emphasizes the design of the classifier models. The most commonly used methods are Bayesian information criterion, Gaussian likelihood ratio and a hidden Markov model (HMM) based classifier. These kinds of conventional techniques put stress on setting up effective classifiers. Although the methods are practical, some of them need larger computation, such as using the Bayesian information criterion, and some of them need to prepare a large number of training data samples in advance to set up the models needed, such as using Gaussian likelihood ratio and hidden Markov model (HMM). They are not good choices in practical applications.