Video motion analysis, including camera motion analysis and object motion analysis, is an important part of understanding video content, and content understanding plays a significant role in video browsing, retrieval, editing, printing, etc., in many multimedia systems, including personal computers (PCs), digital entertainment systems, cameras, and even printers.
Currently, printers are good at representing planar (two-dimensional) media content such as documents and images, but video printing is still a labor-intensive problem because, as three-dimensional (3-D) signals (i.e., two spatial dimensions and one temporal dimension), videos contain much more information with huge amounts of redundancy, which cannot be easily represented by a static medium such as paper.
One way of video printing is to select key frames from a video clip and to print the selected frames. Unfortunately, the key-frame-extraction task is not trivial to automate because selecting key frames to maximize semantic meaning is a difficult computer vision and artificial intelligence problem. Solutions are constrained because it is usually acceptable to print only a reasonable number of key frames. Key frames may be extracted by analyzing low-level content features, such as color, texture, motion, etc.
Existing approaches to motion analysis and content understanding are either not general enough for all types of video (e.g., home video and professional video, short video clips and long video recordings, etc.) or too slow for common processing systems like PCs and embedded systems like cameras. Existing approaches are typically designed for specific tasks, e.g., tracking the movement of a person (with a known-face model) or a car (with a pre-defined car model), and have corresponding simplifications and limited applicability.