1. Field of the Invention
The present invention relates to a method for identifying segment boundaries based on affinity or similarity matrices.
2. Description of the Related Art
Early techniques of automatic extraction of videos focused on cut-boundary detection. The major techniques used have been detection of pixel differences, statistical differences, histogram comparisons, edge differences, compression differences and motion vectors. Histograms are the most common method used to detect shot boundaries. The simplest method computes gray level histograms of the images. Previously, a frame-indexed novelty score was typically computed by correlating a small kernel function along the main diagonal of the similarity matrix, and local maxima in the novelty score were taken to be the segment boundaries.
A number of kernel correlation functions have previously been described for segmentation in videos. Scale-space (SS) analysis compares adjacent time samples and corresponds to using a kernel with non-zero elements only in the first diagonal above or below the main diagonal, i.e. the elements S(n, n+1). Diagonal cross similarity (DCS), is an alternative detection approach. A DCS kernel (KDCS), when centered on a segment boundary, weights only elements of S that compare time-samples separated by a fixed interval (L) from different segments. In the correlation calculation, the elements of S for which KDCS>0 lie on the Lth diagonal above (and below) the main diagonal of S. A full similarity kernel (KFS), and cross similarity kernel (KCS) have also been described.
Most media segmentation techniques threshold on adjacent frame similarity values rather than incorporating a classifier. Only recently have accelerated exact kNN classifiers been adopted for video shot segmentation analysis. However, the frame being evaluated was only compared with earlier in time frames to generate a similarity feature for use with the classifier and the L1 measure was used to calculate the similarity values. In addition, the kNN classifier was not utilized directly with the similarity value and temporal smoothing of the classification outputs was required.