The present invention relates to video data processing, and more particularly to a method for classifying and searching video databases based on 3-D camera motion.
Video is becoming a central medium for the storage, transmission, and retrieval of dense audio-visual information. This has been accelerated by the advent of the Internet, networking technology, and video standardization by the MPEG group. In order to process and retrieve efficiently large amounts of video information, the video sequence has to be appropriately indexed and segmented according to different levels of its contents. This disclosure deals with one method for video indexing based on the (global) camera motion information. The camera, as it captures a given scene, moves around in 3-D space and it consequently induces a corresponding 2-D image motion. For example, a forward-looking camera which moves forward induces in the image plane a dollying motion similar to an optical zoom in motion by which image regions increase in size, and they move out of view as they are approached. This kind of motion is very common in TV broadcast/cable news, sports, documentaries, etc. for which the camera, either optically or physically, zooms in or out or dollys forward and backward with respect to a given scene spot. This indicates the intention to focus the viewer""s attention on particular scene parts. An analogously common camera motion is that of panning, for which the camera rotates about a vertical axis, thus inducing an apparent horizontal movement of image features. In this case the camera shows different parts of a scene as seen from the distance. This is also very common in TV programs, when the intention is that of giving the viewer a general view of a scene, without pointing to any particular details of it. In addition to dollying and panning, the camera may be tracking (horizontal translational motion), booming (vertical translational motion), tilting (rotation about the horizontal axis) and/or rolling (rotation about the forward axis). Taken together, these camera motions constitute a very general mode of communicating content information about video sequences which may be analyzed at various levels of abstraction. This is important for storage and retrieval of video content information which is going to be standardized by MPEG-7 by the year 2001.
What is desired is a general method of indexing and searching of video sequences according to camera motion which is based on full 3-D camera motion information estimated independently of the video contents, e.g., how the camera moves or how many objects there are in a given 3-D scene.
Accordingly the present invention provides a method of classifying and searching video databases based on 3-D camera motion that is estimated independently of the video contents. Indexing and searching is realized on a video database made up of shots. Each video shot is assumed to be pre-processed from a long video sequence. For example, the MPEG-7 video test material is divided into CD-ROMs containing roughly 45 minutes of audio-video data (xcx9c650 Mbytes). The shots are either manually or automatically generated. A collection of these shots makes up a video database. Each shot is individually processed to determine the camera motion parameters and afterwards indexed according to different types of camera motion. Finally, the video database is searched according to user specifications of types of camera motion.