Determining representative images for a video can serve various purposes. For example, representative images may assist a user in browsing in a video so as to find a particular portion of interest. For example, the portion of interest may relate to a specific event, which the user wants to show to family or friends. As another example, the portion of interest may correspond with a point where the user was interrupted from viewing the video. An overview of respective portions of the video can include respective representative images for the respective portions. Such a visually based overview will generally allow the user to conveniently find the particular portion of interest. Representative images may also assist a user in finding a particular video in a collection of videos. Various techniques for determining representative images for a video have been proposed. Some of these techniques select images from scenes that comprise relatively much action to constitute representative images.
The article entitled “Adaptive Key Frame Extraction Using Unsupervised Clustering”, by Zhuang Y. et al. published in the proceedings of the International Conference on Image Processing (ICIP'98), Volume 1, 1998, pp. 866, describes an algorithm for key frame extraction based on unsupervised clustering. A video shot comprising N frames is obtained from a shot boundary detection algorithm, N being an integer number. The N frames of the video shot are clustered into M clusters, M being an integer number. Each cluster has a centroid, which needs to be recalculated when a new image is added to the cluster. For a frame under the consideration, a measure of similarity is calculated between that frame and the centroid of each cluster. A new cluster is created for the frame under consideration in case all measures of similarity thus calculated for that frame are below a threshold. Otherwise, the frame under consideration is assigned to an already existing cluster, namely the cluster for which the measure of similarity has the highest value. The higher the threshold parameter is, the greater the number M of clusters that will be obtained. Once the clusters have been formed, a key frame is extracted from each cluster that has a size is bigger than N/M, the average size of clusters. The key frame for a cluster is the frame which is closest to the cluster centroid.