Computer vision systems generally involve advances that allow computers to process image data to derive meaning from that data. Computer vision is an aspect of artificial intelligence, a field concerned with developing artificial systems to perform cognitive tasks that have traditionally required a living actor, such as a person. Video is generally made up of a sequence of still images. Video summarization, as used herein, refers to selecting sub-sequences of video to create sub-scenes of the video. These sub-scenes may be referred to as clips, highlights, etc.