The wide availability of portable cameras has led to an explosion of short self-made and professional videos. Many of these videos, especially made with POV (Point Of View) action cameras, are related to action sports such as downhill skiing, snowboarding, surfing, mountain biking, etc. The YouTube web site, for example, contains thousands of such videos.
At the same time, the very number and popularity of these videos created its own problem. Firstly, it has become very difficult to find a video of interest when the event of interest is not explicitly associated with the video by its creators. Secondly, since most of the videos are made by amateurs and are not edited, users have to watch the entire video even if they are interested only in some particular portion of it e.g. when a snowboarder jumps or a skier has a particularly fast portion of a run, or any other particular events within a larger video.
At the same time the wide popularity of portable devices with GPS and other sensors allows accurate measurement, storage, and classification of action sport activities. Therefore, if video and performance data can be synchronized in time and space then video footage can be annotated, edited, selected, and tagged based on the performance matrix of a particular activity that was filmed.
A person searching or viewing such video may desire to find particular video or portions of a video. For example, such person may want to search for video that shows snowboarding jumps with air time longer than one second. However, this would be impractical or impossible using the currently available means for video tagging which typically use only semantic and text video descriptions.
Another issue associated with many action videos is that they are made by one person and performance data for the video “subject” are collected by the sensors collocated with the video subject who is a different person.
Attempts have been made to mark video during capture for quick selection. However, such solutions typically use tags that are based on text that is created by others or found in the video.
U.S. Pat. No. 7,624,337 and U.S. Pat. No. 7,823,055 disclose a solution that uses text, including text in the video, to create tags and meta data for later use in video searching.
U.S. Pat. No. 5,832,171 to Heist, et al. describes synchronization of video and text where the text was created for the video.
U.S. Pat. No. 4,873,585 to Blanton et al teaches a system that allows selection of images of particular motions from a video to allow easy access to these images. However, this requires operator intervention and decision making.
U.S. Pat. No. 7,483,049 to Aman et al. discloses creation of a database of videos of sport motions. However, the video has to be created in a very controlled environment by multiple cameras with athletes marked with visible or invisible markers that can be identified in the video.
There is also a body of work on triggering video by particular events, mostly traffic violations. U.S. Pat. No. 7,986,339 to Higgins describes a system capable of recording and analyzing still and video images of a traffic violation. However, the video recording is triggered by an outside physical signal that is generated by a vehicle, e.g. laser or Doppler radar. U.S. Pat. No. 6,919,823 to Lock and U.S. Pat. No. 7,633,433 to Behrens are similar, with a triggering signal generated by a red light change or a laser beam interrupted after the red light change.
In addition, in the above cases, the relative position of a camera and the video subject are known in advance, and so the solution does not provide any time and space domain search to match the event and the video footage.