The application is related to the following applications assigned to the same applicant as the present invention and filed on even date herewith, the disclosures of which are hereby incorporated by reference:
Method and apparatus for compressing video sequences (Our file: IDT 018 WO). Method and apparatus for compression of video images and image residuals (Our file: IDT 018 WO).
This patent deals with the field of motion estimation in sequences of two-dimensional images with arbitrary shapes over several frames where no restriction on the type of image data is given. Image sequences can be acquired for instance by video, X-ray, infrared, radar cameras or by synthetic generation etc.
Motion estimation is a highly under-determined problem, therefore additional constraints are necessary in order to get a unique solution for the corresponding system of equations. In many approaches isotropic or anisotropic spatial smoothing terms are used for this purpose. But this is still not sufficient to get satisfying results for real sequences. For tracking motion over several frames, detecting motion vectors with high amplitudes, overcoming the xe2x80x9caperture problemxe2x80x9d and aliasing effects in time, stabilizing the motion estimation against outliers and noise and getting high correlated motion estimates in time and space enhanced prediction and filtering methods have to be applied. Although a lot of work has been done in the framework of estimating dense motion fields, a conclusive, detailed treatment of arbitrary shaped images is hardly described, especially for hierarchical motion estimation systems. For general reference see the following reference list:
1. Joachim Dengler. Local motion estimation with the dynamic pyramid. Pyramidal Systems for Computer Vision, F25:289-297, 1986. Comment: Presentation of a pyramidal approach.
2. Enkelmann. Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences. Computer Vision, Graphics and Image Processing, 43:150-177, March 1988. Comment: Applying multigrid methods for solving estimating optical flow fields by using orientated smoothness constraints.
3. Sugata Ghosal and Petr Vanok. A fast scalable algorithm for discontinuous optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(2), February 1996, Comment: Multigrid approach for solving the motion estimation problem by using anisotropic smoothness constraints.
4. Gonzalez and R. E. Wood. Digital Image Processing. Addison Wesley, 1992. Comment: General image processing book.
5. Sheila S. Hemami Gregory U. Conklin. Multi-resolution motion estimation. In IEEE ICASSP Mxc3xcnchen, pages 2873-2876, 1997. Comment: Coarse to fine propagation versus fine to coarse propagation.
6. B. K. P Horn and B. G. Schunck. Determining optical flow. Artificial Intelligence, 17:185-203, 1981. Comment: Basic article for gradient based approaches.
7. Bernd Jaehne. Digitale Bildverarbeitung. Springer-Verlag, 1993. Comment: General book about image processing. General description of pyramidal approaches.
8. P. Anandan; J. R. Bergen and K. J. Hanna. Hierarchial model-based motion estimation. In Reginald L. Lagendijk M. Ibrahim Sezan, editor, Motion Analysis and Image Sequence Processing. Kluwer Academic Publishers, 1993. Comment: Introduction to the advantage of using pyramidal approaches for determining optical flow.
9. Hans-Helmut Nagel. Image sequencesxe2x80x94ten (octal) yearsxe2x80x94from phenomenology towards a theoretical foundation. IEEE, pages 1174-1185, 1986. Comment: Overview article.
10. P. Anandan. A unified perspective on computational techniques for the measurement of visual motion. IEEE, Conference on Computer Vision, pages 219-230, 1987. Comment: Overview of the problems and possibilities of pyramidal approaches for motion estimation.
11 . Adelson P. J. Burt. The laplacian pyramid as a compact image code. IEEE Trans. Communications, 31:532-540, 1983. Comment: Introduction to pyramids.
12. Singh. Optic Flow Computation, A Unified Perspective. IEEE Computer Society Press Monograph, 1991. Comment: General introduction and presentation of a framework for motion estimation.
13. T. Lin and J. L. Barron. Image reconstruction error for optical flow. from Internet, 1996. Comment: Comparison of different motion estimators.
14. Woods and J. Kim. Motion compensated spatial temporal kalman filter. In Reginald L. Lagendijk M. Ibrahim Sezan, editor, Motion Analysis and Image Sequence Processing. Kluwer Academic Publishers, 1993. Comment: Noise reduction in image sequences by using the time correlation between images. The method is a combination of motion compensation and spatial temporal Kalman filtering.
15. B. Chupeau, M. Pecot. Method for hierarchical estimation of the movement in asequence of images, U.S. Pat. No. 5,278,915, issued Jan. 11, 1994, Thomson-CSF, Puteaux, France.
16. V. Markandey. System and method for determining optical flow, U.S. Pat. No. 5,680,487, issued Oct. 21, 1997, Texas Instruments Incorporated, Dallas, Tex.
It is an object of this invention to provide mechanisms for improving motion estimation between arbitrary shaped images where large displacement amplitudes may occur. The improvements concern for example the quality of images predicted from the motion fields (i.e. a reduction of the displaced frame differences) and the temporal and spatial correlation of the motion fields performing motion estimation within a set of subsequent images. The improvement of temporal and spatial correlation can be useful in image analysis and compression of motion fields.
It is an object of the invention to provide hierarchical systems which are able to estimate dense motion fields between arbitrary shaped images. The explicit treatment of the shapes as described in the present invention allows a natural consideration of invalid pixels which may occur during the estimation process.
It is an object of the invention to provide methods which are applicable in motion estimation schemes where an image is predicted by forward warping as well as for motion estimation schemes where an image is predicted by backward warping.
It is a further object of the present invention to provide a technique for motion estimation in a sequence of related images. The images can be related in any way, for instance temporal or spatial (i.e. in subsequent resolutions).
It is a further object of this invention to provide tracking of motion for several frames where large displacement amplitudes may occur.
It is a further object of this invention to provide a technique for combining motion fields achieved by different estimations.
It is a further object of this invention to provide a technique for propagating information in a subsequent estimation process.
It is a further object of this invention to provide a technique for a local adaptive filtering of motion fields in order to achieve a gain in quality.
It is a further object of this invention to provide a technique for using motion fields from former estimations as hypotheses for the following estimation.
Dv: Vertical component of the motion field.
Dh: Horizontal component of the motion field.
D: All components of the displacement field, i.e. the motion field.
D:=(Dv, Dh) for two dimensions.
Hv: Vertical component of a hypothesis for the motion field.
Hh: Horizontal component of a hypothesis for the motion field.
H: All components of the hypothesis for the motion field.
H:=(Hv, Hh) for two dimensions.
ID: Image in the coordinate system of the motion field D.
SD: Shape field in the coordinate system of the motion field D. It is a validity field which defines the valid pixels for all fields in the position (coordinate system) of D.
IT: Image in target position, i.e. the image xe2x80x9ctoxe2x80x9d which the motion field points.
ST: Shape field in target position. It is a validity field which defines the valid pixels for all fields in the target position.
{circumflex over (X)}: A field X which is created by forward warping, i.e. forward motion compensation, as for example described in Method and apparatus for compressing video sequences, already included by reference.
{tilde over (X)}: A field X which is created by backward warping, i.e. backward motion compensation, as for example described in Method and apparatus for compressing video sequences, already included by reference.
SProp: A validity field which defines pixels to be propagated.
Xk: A field or value X on pyramid level k. In general pyramid level indices are written at superscript and the counting starts with the finest resolution level k=0,1,2, . . . If all fields are defined on the same pyramid level the superscript k is omitted. With the term xe2x80x98Block of pixelsxe2x80x99 an arbitrary shaped group of pixels is described, too.
The subscripts (D,T) do only define in which coordinate system the motion field is defined. The image to be predicted may be the image in target position (IT) for a forward warping scheme or the image in the coordinate system of the motion field (ID) for a backward warping scheme. In both cases the motion field is estimated from the image ID with the corresponding shape SD to the image IT with the corresponding shape ST.
Images without shapes can be described as shaped images where the shapes consist merely of valid pixels.
The invention is based on a hierarchical motion estimation system which provides motion estimation between arbitrary shaped images. Relations between the images and their shapes are used to stabilize the motion estimation, detect large displacements and to track motion over several frames in a recursive scheme. Due to the fact that the shape information can be used to distinguish either between inside and outside a video object or between valid or invalid motion vectors, the shape field flow within the pyramidal motion estimation can take both features into consideration.
The present invention is applicable for estimating motion fields which are used for forward compensation as well as for estimating motion fields which are used for backward compensation.
According to one of its embodiments the present invention uses a propagation strength for the propagation of data from former estimation steps in order to avoid propagation of data with low confidence.
The present invention further according to one of its embodiments employs to set propagation strength according to the shapes, the image intensity gradients and confidence measurements.
According to one of its embodiments the present invention comprises a methods and/or an apparatus to use motion fields as hypothesis for motion estimation and allow motion estimation between a reference frame and frames which are related with the reference frame by motion data with large amplitudes. The methods are not restricted to certain basic motion estimation methods, for instance gradient based methods, matching methods, phase correlation and Markov random field approaches. Due to the restrictions of these basic motion estimation methods, higher level motion estimation methods are required in many applications.
According to one of its embodiments the present invention employs the combination of preliminary motion fields to a final field. The preliminary motion fields are achieved from former estimations and temporal extrapolations of them and/or from estimations in different resolutions within a pyramidal system. The combination is performed by selecting those motion vectors from the set of preliminary motion fields which yield the best local predictions. The selection is stored in a so called choice field. Various enhancements to this basic approach are presented: The choice field is filtered using a median filter. The choice field is altered in order to minimize the number of bits to represent the final field. Masking effects of the human visual system are considered. Furthermore the usage of different color channels is described.
According to one of its embodiment the present invention applies local adaptive filtering in order to provide data dependent spatial inhomogeneous filtering of motion fields. Image gradient fields, motion gradient fields, confidence measurement fields or system dependent requirements can be used to set the filter masks.
According to one of its embodiments the present invention sets filter masks for local adaptive filtering of motion fields.
According to one of its embodiments the present invention comprises an hierarchical motion estimation apparatus which uses different combinations of the methods according to the embodiments of the invention.
According to one of its embodiments the present invention comprises an hierarchical motion estimation apparatus which performs motion estimation in a subsequent set of shaped images and uses motion fields from former estimations as hypothesis.
The aforementioned features also may be combined in an arbitrary manner to form another particular embodiment of the invention.