Camera phones and other personal digital-video capture devices have become nearly ubiquitous in the early 21st century. As a result, many individuals and entities have acquired sizable libraries of digital video footage, much of it recorded during vacations, parties, or other events.
However, while it is very easy to record video footage, editing and curating one's digital-video library can be a tedious, difficult, and time-consuming chore. Consequently, several approaches to automatic video indexing and segmentation have been developed. Some of these approaches operate on decoded or decompressed image data, detecting scene changes by inspecting pixel values of frames of video. However, most digital video is stored in encoded or compressed format, and decoding compressed video to obtain image data is a relatively computationally expensive operation.
Other approaches operate on encoded or compressed video, analyzing information that is accessible without decoding the video, such as discrete cosine transform (“DCT”) values and motion vectors of successive inter-frames of encoded video.
However, existing approaches tend to be complex and may not scale well. Furthermore, existing approaches that merely identify scene changes within a video do not necessarily provide information about which of the identified scenes may be comparatively interesting to a human observer.