With the increasing capacity of video storage devices, the need emerges for structuring and summarization of video content for convenient browsing by the user. Video browsing is enabled by metadata (i.e. data about data), which is preferably extracted automatically.
FIG. 1 depicts the prior art of motion related metadata extraction from MPEG (Moving Picture Experts Group) compressed video in the pel domain. Full decoding of MPEG video into the pel domain is performed by an MPEG decoding unit 11. A motion estimation unit 12 (based on optical flow calculation or block matching which is known to the skilled in the art) calculates motion vectors from the pel representation of the video stream. The parametric and camera motion calculation unit 13 calculates from these motion vectors the motion related metadata.
For camera motion estimation in the pel domain there are existing patents “U.S. Pat. No. 5,751,838: 5/1998: Ingemar J. Cox, Sebastien Roy: Correction of camera motion between two image frames: 382/107” and publications.
“Yi Tong Tse, Richard L. Baker: Global Zoom/Pan estimation and compensation for video compression: ICASSP 91, 1991, pp. 2725-2728” estimates camera zoom and pan for video encoding. However, this method may produce unreliable results in case of other camera motion types than the modeled ones.
“A. Akutsu, Y. Tonomura, H. Hashinoto, Y. Ohba: Video indexing using motion vectors: SPIE vol. 1818 Visual Communications and Image Processing, 1992, pp. 1522-1530” extracts camera motion in the pel domain using the Hough transformation, though the described method does not extract the amount of the camera motion.
“Jong-Il Park, Nobuyuki Yagi, Kazumasa Enami, Kiyoharu Aizawa, Mitsutoshi Hatori Estimation of Camera Parameters from Image Sequence for model based video coding: IEEE Trans. CSVT, vol. 4, no. 3, June 1994, pp 288-296” and “Jong-Il Park, Choong Woong Lee: Robust estimation of camera parameters from image sequence for video composition: Signal Processing: Image Communication: vol. 9, 1996, pp 43-53” find feature points in the pel domain using a texture gradient and determine the camera motion from the motion of these feature points.
“Jong-Il Park, Choong Woong Lee: Robust estimation of camera parameters from image sequence for video composition: Signal Processing: Image Communication: vol. 9, 1996, pp 43-53” uses an outlier rejection method to make the camera motion estimation in the pel domain more robust.
“Y. P. Tan, S. R. Kulkarni, P. J. Ramadge: A new method for camera motion parameter estimation: Proc. ICIP, 1995, pp 406-409” describes a recursive least squares method for camera motion estimation in the pel domain, based on the assumption of a small amount of camera motion.
“Philippe Joly, Hae-Kwang Kim: Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal images: Signal Processing Image communication, vol. 8, 1996, pp. 295-307” describes a camera motion estimation algorithm in the pel domain based on the Sobel operator or a threshold edge detection unit and spatio-temporal projection of the edges into line patterns. The line patterns are analyzed using the Hough transform to extract edges in motion direction.
In “M. V. Srinivasan, S. Venkatesh, R. Hosi: Qualitative estimation of camera motion parameters from video sequence: Pattern recognition, Elsevier, vol. 30, no. 4, 1997, pp 593-606”, camera motion parameters are extracted from uncompressed video in the pel domain, where the amount of camera pan, tilt, rotation and zoom is provided separately.
“Richard R. Schultz, Mark G. Alford: Multiframe integration via the projective transform with automated block matching feature point selection: ICASSP 99, 1999” proposes a subpixel resolution image registration algorithm in the pel domain based on a nonlinear projective transform model to account for camera translation, rotation, zoom, pan and tilt.
“R. S. Jasinschi, T. Naveen, P. Babic-Vovk, A. J. Tabatabai: Apparent 3-D camera velocity extraction and its Applications: IEEE Picture Coding Symposium, PCS 99, 1999” describes a camera velocity estimation in the pel domain for the applications database query and sprite (mosaic) generation.
Due to the huge storage size of video content more and more video material is available in compressed MPEG-1/MPEG-2 or MPEG-4 format. However, the camera motion estimation algorithms developed for the pel domain (as listed above) are not directly applicable to the MPEG compressed domain. Therefore time consuming decoding of the MPEG compressed bitstream is required and as well a computational demanding motion estimation in the pel domain and a camera motion estimation has to be performed (FIG. 1).
More over, to circumvent the computational burden of MPEG video decompression and camera motion estimation in the pel domain, camera motion estimation performed in the compressed domain has been proposed. Previous work on camera motion estimation in the compressed domain is based on using MPEG motion vectors and fitting them into a parametric motion model describing camera motion.
FIG. 2 depicts the current state of the art of motion related metadata extraction from MPEG compressed video. Parsing of MPEG video is performed by an MPEG bitstream parsing unit 21. From this parsed bitstream the motion vectors are extracted 22 and passed to the parametric and camera motion calculation unit 23.
“V. Kobla, D. Doennann, K-I. Lin, C. Faloutsos: Compressed domain video indexing techniques using DCT and motion vector information in MPEG video: SPIE Conf on Storage and Retrieval for Image and Video Databases V: vol. 3022, February 1997, pp. 200-211” determines “flow-vectors” from MPEG compressed domain motion vectors by using a directional histogram to determine the overall translational motion direction. However, this basic model is not able to detect camera zoom and rotation.
“Roy Wang, Thomas Huang: Fast Camera Motion Analysis in MPEG domain: ICIP 99, Kobe, 1999” describes a fast camera motion analysis algorithm in MPEG domain. The algorithm is based on using MPEG motion vectors from P-frames and B-frames and interpolating motion vectors from B-frames for I-frames. An outlier rejection least square algorithm for parametric camera motion estimation is used to enhance the reliability of the camera motion parameter extraction from these motion vectors.
However, using MPEG motion vectors for camera motion estimation has several drawbacks.
First, motion vectors in a compressed MPEG stream do not represent the real motion but are chosen for fast or bitrate efficient compression at the encoder and depend on the encoder manufacturer's encoding strategy which is not standardized by MPEG and can differ significantly. For example, for fast MPEG encoding low complexity motion estimation algorithms are employed in contrast to high-bitrate and high quality MPEG encoding, where motion estimation algorithms with increased search range are used, cf. “Peter Kulm: Algorithms, Complexity Analysis and VLSI-Architectures for MPEG-4 Motion Estimation: Kluwer Academic Publishers, June 1999, ISBN 792385160”.
Further, the performance of using MPEG motion vectors for camera motion estimation depends significantly of MPEG's Group of Picture (GOP) structure, the video sampling rate (e.g. 5 . . . 30 frames per second) and other factors, and is therefore not reliable for exact camera motion estimation. For example some MPEG encoder implementations in the market modify the GOP structure dynamically for sequence parts with fast motion.
More over, MPEG motion vectors (especially small ones) are often significantly influenced by noise and may be not reliable.
Further, in case of a restricted motion estimation search area used by some fast motion estimation algorithms, there may not exist long motion vectors.
Further more, I-frame only MPEG video contains no motion vectors at all. Therefore the algorithms based on employing MPEG motion vectors are not applicable here. I-frame only MPEG video is a valid MPEG video format, which is used in video editing due to the capability of frame exact cutting. In this field motion related metadata is very important, e.g for determining the camera work.
Further, some compressed video formats like DV and MJPEG are based on a similar DCT (Discrete Cosine Transform)—structure like the MPEG formats, but contain no motion information. Therefore the camera motion estimation algorithms based on motion vectors contained in the compressed stream are not applicable to these cases.
Moreover, interpolation of motion vectors for I-frames from B-frames fails in case of rapid camera or object motion, where new image content occurs.