The invention relates to a camera motion parameters estimation method, said parameters being intended to become descriptors in the MPEG-7 video indexing framework.
The invention relates to a camera motion parameters estimation method, said parameters being intended to become descriptors in the MPEG-7 video indexing framework.
The last decades have seen the development of large databases of information, accessible to many people. These databases are composed of several types of media such as text, images, sound, etc . . . The characterization, representation, indexing, storage, transmission and retrieval of such information constitute important issues in the usefulness of this technology. Whatever the level of sub-division at which video indexing can be contemplated, each information sub-division can be then indexed according to several criteria such as semantic information content, scene attributes, camera motion parameters, and so on. MPEG-7, also named xe2x80x9cMultimedia Content Description Interfacexe2x80x9d and focussing on content-based retrieval problems, will standardize generic ways to describe such multimedia content, using descriptors and description schemes that would be associated to multimedia material, in order to allow fast and efficient retrieval based on various types of features, such as text, color, texture, motion and semantic content. This standard will address applications that can be either stored (on-line or off-line) or streamed (e.g. broadcast or video in the internet) and can operate in both real time and non-real time environments.
A schematic block diagram of a possible MPEG-7 processing chain, shown in FIG. 1 and provided for processing any multimedia content, includes at the coding side a feature extraction sub-assembly 11 operating on said content, a normative sub-assembly 12, including a module 121 for yielding the MPEG-7 definition language and a module 122 for defining the MPEG-7 descriptors and description schemes, a standard description sub-assembly 13, and a coding sub-assembly 14. The scope of the MPEG-7 standard is the sub-assembly 12, and the invention is located in the sub-assemblies 12 and 13. FIG. 1 also shows the decoding side, including a decoding sub-assembly 16 Oust after a transmission of the coded data, or a reading operation of these stored coded data), and a search engine 17 working in reply to the actions controlled by the user.
In the MPEG-7 framework, efficient tools must be developed for many subjects like scene analysis or motion analysis, and particularly methods for camera motion feature extraction. For a motion representation, two solutions can be proposed as a possible basis for global motion descriptors extraction: the perspective model, and the block matching method. The former is well suited for camera global motion, but cannot represent tridimensional translations, that have to be distinctly described each time it is possible.
Block matching motion compensation is used as a part of the predictive coding process that is widely used in video transmission for reducing the amount of information needed to encode a video sequence. Indeed, only a little fraction of an image changes from a frame to the following one, allowing a straightforward prediction from said previous frame. More precisely, each frame (i+1) is divided into a fixed number of blocks (usually square). For each block (typically of 8xc3x978 pixels), a search is made for the most similar block in a previous reference frame (i), over a predetermined area. The search criterion is generally the search of the best matching block, giving the least prediction error, usually computed as the mean absolute difference (which is easier to compute than for instance the mean square difference). For each block (in the present example, of 8xc3x978 pixels) located in (x,y), the predicted image (i+1) is then computed from image (i) according to the relation (1):
B(i+1)[x,y]=B(i)[xxe2x88x92dx, yxe2x88x92dy]xe2x80x83xe2x80x83(1)
with (dx, dy)={right arrow over (v)}=motion vector leading from B(i), in the image (i), to B(i+1), in the image (i+1).
When starting from block matching motion vectors in order to estimate camera movements, the main problem is then that the efficiency of the estimator of these vectors is only measured in terms of a coding criterion. Motion vectors do not necessarily correspond to the real motion of the scene. For example, in an area of homogeneous texture in the scene, the estimator could choose any of the blocks inside the texture, even if the motion vector is not representative of the global motion. However, although block matching represents a motion that is not always consistent, this method will be preferred, because translations have to be distinctly described each time it is possible and the perspective model is not able to do it. Starting from motion vectors thus determined, some camera parameters will then be defined. Before describing the corresponding definition method, the camera model used in the present description is first presented.
A monocular camera moving through a static environment is considered. As can be seen in FIG. 2, let 0 be the optical centre of the camera and OXYZ an external coordinates system that is fixed with respect to the camera, OZ being the optical axis. Let Tx, Ty, Tz be the translational velocity of OXYZ relative to the scene and Rx, Ry, Rz its angular velocity. If (X,Y,Z) are the instantaneous coordinates of a point P in the tridimensional scene, the velocity components of P will be:
{overscore (X)}=xe2x88x92Txxe2x88x92Ry.Z+Rz.Yxe2x80x83xe2x80x83(2)
{overscore (Y)}=xe2x88x92Tyxe2x88x92Rz.X+Rx.Zxe2x80x83xe2x80x83(3)
{overscore (Z)}=xe2x88x92Tzxe2x88x92Rx.Y+Ry.Xxe2x80x83xe2x80x83(4)
The image position of P, namely p, is given in the image plane by the relation (5):
(x,y)=internal coordinates=(f X/Z, f Y/Z)xe2x80x83xe2x80x83(5)
(where f is the focal length of the camera), and will move across the image plane with an induced velocity:
(ux, uy)=({overscore (x)},{overscore (y)})xe2x80x83xe2x80x83(6)
After some computations and substitutions, the following relations are obtained:                               u          x                =                              f            ·                                          X                _                            Z                                -                      f            ·                                          X                ·                                  Z                  _                                                            Z                2                                                                        (        7        )                                          u          x                =                                            f              Z                        ⁢                          (                                                -                                      T                    x                                                  -                                                      R                    y                                    ·                  Z                                +                                                      R                    z                                    ·                  Y                                            )                                -                                                    f                ·                X                                            Z                2                                      ⁢                          (                                                -                                      T                    z                                                  -                                                      R                    x                                    ·                  Y                                +                                                      R                    y                                    ·                  X                                            )                                                          (        8        )                                and        ⁢                  :                                    xe2x80x83                                          u          y                =                              f            ·                                          Y                _                            Z                                -                      f            ·                                          Y                ·                                  Z                  _                                                            Z                2                                                                        (        9        )                                          u          y                =                                            f              Z                        ⁢                          (                                                -                                      T                    y                                                  -                                                      R                    z                                    ·                  X                                +                                                      R                    x                                    ·                  Z                                            )                                -                                                    f                ·                Y                                            Z                2                                      ⁢                          (                                                -                                      T                    z                                                  -                                                      R                    x                                    ·                  Y                                +                                                      R                    y                                    ·                  X                                            )                                                          (        10        )            
which can also be written:                                           u            x                    ⁡                      (                          x              ,              y                        )                          =                                            -                              f                Z                                      ·                          (                                                T                  x                                -                                  x                  ·                                      T                    z                                                              )                                +                                                    x                ·                y                            f                        ·                          R              x                                -                                    f              ⁡                              (                                  1                  +                                                            x                      2                                                              f                      2                                                                      )                                      ⁢                          R              y                                +                      y            ·                          R              z                                                          (        11        )                                                      u            y                    ⁡                      (                          x              ,              y                        )                          =                                            -                              f                Z                                      ·                          (                                                T                  y                                -                                  y                  ·                                      T                    z                                                              )                                -                                                    x                ·                y                            f                        ·                          R              y                                +                                    f              ⁡                              (                                  1                  +                                                            y                      2                                                              f                      2                                                                      )                                      ⁢                          R              x                                +                      x            ·                          R              z                                                          (        12        )            
Moreover, in order to include the zoom in the camera model, it is assumed that a zoom can be approximated by a single magnification in the angular domain. Such an hypothesis is valid if the distance of the nearest object in the scene is large compared to the change of focal length used to produce the zoom, which is usually the case.
A pure zoom is considered in FIG. 3. Given a point located in the image plane, on (x,y) at a time t and on (xxe2x80x2, yxe2x80x2) at the next time txe2x80x2, the image velocity ux=xxe2x80x2-x along x induced by the zoom can be expressed as a function of Rzoom (Rzoom being defined by the relation (xcex8xe2x80x2xe2x88x92xcex8)/xcex8, as indicated in FIG. 3), as shown below.
One has indeed: tan (xcex8xe2x80x2)=xxe2x80x2/f and tan (xcex8)=x/f, which leads to:
ux=xxe2x80x2xe2x88x92x=[tan(xcex8xe2x80x2)xe2x88x92tan (xcex8)].fxe2x80x83xe2x80x83(13)
The expression of tan (xcex8xe2x80x2) can be written:                               tan          ⁡                      (                          θ              xe2x80x2                        )                          =                              tan            ⁡                          [                                                (                                                            θ                      xe2x80x2                                        -                    θ                                    )                                +                θ                            ]                                =                                                    tan                ⁡                                  (                                                            θ                      xe2x80x2                                        -                    θ                                    )                                            +                              tan                ⁡                                  (                  θ                  )                                                                    1              -                                                tan                  ⁡                                      (                    θ                    )                                                  ·                                  tan                  ⁡                                      (                                                                  θ                        xe2x80x2                                            -                      θ                                        )                                                                                                          (        14        )            
Assuming then that the angular difference (xcex8xe2x80x2xe2x88x92xcex8) is small, i.e. tan (xcex8xe2x80x2xe2x88x92xcex8) can be approximated by (xcex8xe2x80x2xe2x88x92xcex8), and that (xcex8xe2x80x2xe2x88x92xcex8).tan xcex8 less than  less than 1, one obtains:                               u          x                =                                            x              xe2x80x2                        -            x                    =                      f            ·                          [                                                                                          (                                                                        θ                          xe2x80x2                                                -                        θ                                            )                                        +                                          tan                      ⁡                                              (                        θ                        )                                                                                                  1                    -                                                                                            (                                                                                    θ                              xe2x80x2                                                        -                            θ                                                    )                                                ·                        tan                                            ⁢                                              xe2x80x83                                            ⁢                      θ                                                                      -                                  tan                  ⁢                                      xe2x80x83                                    ⁢                  θ                                            ]                                                          (        15        )                                          u          x                =                  f          ·                                                    (                                                      θ                    xe2x80x2                                    -                  θ                                )                            ·                              (                                  1                  +                                                            tan                      2                                        ⁡                                          (                      θ                      )                                                                                                          1              -                                                                    (                                                                  θ                        xe2x80x2                                            -                      θ                                        )                                    ·                  tan                                ⁢                                  xe2x80x83                                ⁢                θ                                                                        (        16        )                                          u          x                =                  f          ·          θ          ·                      R            ZOOM                    ·                                    1              +                                                tan                  2                                ⁡                                  (                  θ                  )                                                                    1              -                                                                    (                                                                  θ                        xe2x80x2                                            -                      θ                                        )                                    ·                  tan                                ⁢                                  xe2x80x83                                ⁢                θ                                                                        (        17        )            
which is practically equivalent to:
ux=xxe2x80x2xe2x88x92x=f.xcex8.Rzoom.(1+tan2xcex8)xe2x80x83xe2x80x83(18)
This result can be rewritten:                               u          x                =                  f          ·                                    tan                              -                1                                      ⁡                          (                              x                f                            )                                ·                      R            ZOOM                    ·                      (                          1              +                                                x                  2                                                  f                  2                                                      )                                              (        19        )            
and, similarly, uy is given by:                               u          Y                =                  f          ·                                    tan                              -                1                                      ⁡                          (                              y                f                            )                                ·                      R            ZOOM                    ·                      (                          1              +                                                y                  2                                                  f                  2                                                      )                                              (        20        )            
The velocity u=(ux, uy) corresponds to the motion induced in the image plane by a single zoom. A general model in which all the rotations, translations (along X and Y axis) and zoom are taken into account can then logically be defined.
This general model can be written as the sum of a rotational velocity, representing rotational and zoom motions, and a translational velocity, representing the X and Y translations (i.e. tracking and booming respectively)                     {                                                                              u                  x                                =                                                      u                    x                    trans                                    +                                      u                    x                    rot                                                                                                                                            u                  y                                =                                                      u                    y                    trans                                    +                                      u                    y                    rot                                                                                      }                            (        21        )            
with:       {                                                      u              x              trans                        =                                          -                                  f                  Z                                            ·                              T                x                                                                                                    u              y              trans                        =                                          -                                  f                  Z                                            ·                              T                y                                                          }        {                                                      u              x              rot                        =                                                                                x                    ·                    y                                    f                                ·                                  R                  x                                            -                              f                ·                                  (                                      1                    +                                                                  x                        2                                                                    f                        2                                                                              )                                ·                                  R                  y                                            +                              y                ·                                  R                  z                                            +                              f                ·                                                      tan                                          -                      1                                                        ⁡                                      (                                          x                      f                                        )                                                  ·                                  (                                      1                    +                                                                  x                        2                                                                    f                        2                                                                              )                                ·                                  R                  zoom                                                                                                                    u              y              rot                        =                                                                                x                    ·                    y                                    f                                ·                                  R                  y                                            +                              f                ·                                  (                                      1                    +                                                                  y                        2                                                                    f                        2                                                                              )                                ·                                  R                  x                                            -                              x                ·                                  R                  z                                            +                              f                ·                                                      tan                                          -                      1                                                        ⁡                                      (                                          y                      f                                        )                                                  ·                                  (                                      1                    +                                                                  y                        2                                                                    f                        2                                                                              )                                ·                                  R                  zoom                                                                          }  
equations in which only translational terms depend on the object distance Z.
The article xe2x80x9cQualitative estimation of camera motion parameters from video sequencesxe2x80x9d, by M. V. Srinivasan and al., Pattern Recognition, vol.30, no.4, 1997, pp.593-605, describes for extracting camera motion parameters from a sequence of images a technique using the camera equations (21) to (23). More precisely, the basic principle of this technique is explained in part 3 (pp.595-597) of said article. The technique, implemented by finding the best values of Rx, Ry, Rz and Rzoom that create a flow field which, when subtracted from the original optic flow field, then results in a residual flow field wherein all the vectors are parallel, uses an iterative method minimizing deviations from parallelism of the residual flow vectors, by means of an advantageous sector-based criterion.
At each step of this iterative method, the optic flow due to the current camera motion parameters is calculated according to one of two different camera models. A first model assumes that the angular size of the visual field (or the focal length f) is known: this means that the ratios x/f and y/f in the equations (23) can be calculated for each point in the image, said equations then allowing to calculate the optic flow exactly.
A second model assumes that the visual field of the camera is not known. Small field approximations (x/f and y/f very lower than 1) are then necessary before applying the equation (23), which leads to the equations (24) and (25):
uxrotxe2x96xa1xe2x88x92f.Ry+y.Rz+x.Rzoomxe2x80x83xe2x80x83(24)
uyrotxe2x96xa1xe2x88x92f.Rxxe2x88x92x.Rz+y.Rzoomxe2x80x83xe2x80x83(25)
It appears that the first model, which is the one taking into account panning or tilting distorsions, produces more accurate results when the visual field of the camera is large and known. Unfortunately, the focal length is generally not known, which leads to use the second model, only on a restricted area of the image when the visual field is suspected to be large. However, this second model is not a satisfying solution since it is not possible to distinguish panning from tracking.
While horizontal and vertical trackings produce flow fields in which all vectors are truly parallel, this is not the case with pan and tilt unless the visual field of the camera is small, for instance: 20xc2x0xc3x9720xc2x0 (large visual fields lead to distorsions in the velocity fields arising from the planar geometry of the imaging surface). The flow field produced by a zoom is also distorted (far away from the centre, motion vectors are no longer radially oriented). If the visual field is large, the resulting distorsions can be used to enable translational motions to be uniquely distinguished from pan and tilt.
It is therefore an object of the invention to improve the scheme for camera motion features estimation from motion vectors by proposing a camera motion parameters estimation method that is able, each time it is physically possible, to make the difference between tracking and panning and differently carry out them when the visual field is large, even if it is unknown, but without making this difference when the visual field is small.
It is therefore an object of the invention to improve the scheme for camera motion features estimation from motion vectors by proposing a camera motion parameters estimation method that is able, each time it is physically possible, to make the difference between tracking and panning and differently carry out them when the visual field is large, even if it is unknown, but without making this difference when the visual field is small.
To this end, the invention relates to an estimation method provided for considering a sequence of successive video frames subdivided into blocks and processing this sequence, wherein said processing operation comprises according to the successive steps of:
extracting from said video sequence vectors corresponding to the motion between two successive frames, said motion vectors forming the camera velocity field;
preprocessing the camera velocity field, in order to reduce the amount of data and the heterogeneousness of said extracted motion vectors;
estimating for each pair of frames, from said preprocessed field, camera features between the two considered frames;
undertaking on the basis of said estimation a long term motion analysis to obtain motion descriptors corresponding to the estimated camera motion parameters.
The main idea is that if a large visual field generates distorsions in the velocity field in most cases, the same distorsions should also be useful. In other words, if the focal length (or the visual field, which is the same information with an image size scale factor) is included in the minimization process that uses the first model hereinabove mentioned, it should be correctly estimated when the visual field is not too small and when there is actually one of the zoom, pan, tilt or roll motion components (which represents an important part of real cases; this focal length estimation would not be meaningful if the visual field is too small or if there is only a tracking motion).