Digital cameras are increasingly being deployed to capture video data. There has been a simultaneous decrease in the cost of mobile and wearable digital cameras. In combination, this has resulted in an ever increasing number of such devices. Consequently, the amount of visual data being acquired grows continuously. By some estimates, it will take an individual over 5 million years to watch the amount of video that will cross global IP networks each month in 2019.
A large portion of the vast amounts of video data being produced goes unprocessed. Prior art methods that exist for extracting meaning from video are either application-specific or heuristic in nature. Therefore, in order to increase the efficiency of processes like human review or machine analysis of video data, there is a need to automatically extract concise and meaningful representations of video.