The invention relates to an improved method and apparatus for representing moving objects appearing in a sequence of images.
There are various known techniques for deriving representations of objects in an image or sequence of images. Such representations are useful, for example, for indexing images for search and retrieval purposes. Text-based indexing, which is subjective and labour-intensive, was followed by indexing on the basis of intrinsic properties of objects such as colour, shape and outline. A further development has been the representation of motion of an object in a sequence of images. A very important aspect of such a representation is an efficient encoding that minimises the bandwidth or storage required while introducing minimal approximation error.
A first known technique of representing motion is known as Parameter Trajectory. In this technique, motion of the entire object is represented using one or more parameters, such as rotation and translation. For example, displacement vectors between vertices in a frame and a reference frame are derived, and used to solve equations for the known motion model, such as an affine transformation motion. This technique results in a relatively compact representation, but is limited to rigid objects and where the motion model (eg affine transformation) is known.
Another known technique of representing motion is known as Figure Trajectory. In this technique, motion of an object is represented by the trajectories of each of a number of representative points independently. More specifically, the co-ordinates of each representative point are tracked through a sequence of frames, and a function approximating the trajectory of each coordinate for each representative point is derived. The function approximation is performed using known techniques, such as linear approximation, quadratic approximation or spline function approximation. Figure Trajectory can also be used for various types of motion including motion of rigid objects with an unknown and/or complex motion.
In both techniques, if an object has a relatively complex outline, it can be represented by a reference region, such as a rectangle, ellipse or polygon, which is an approximation of the more complex object outline. For a polygon, the representative points may be the vertices. Similarly, for a rectangle, the representative points may be limited to three of the four vertices, because the position of the fourth vertex can be calculated from the position of the other three vertices. For an ellipse, three vertices of the circumscribing rectangle may be selected as representative points.
Published applications U.S.2001/0040924 and U.S. 2001/0025298 disclose methods of representing motion of an object in a sequence of images similar to Figure Trajectory and Parameter Trajectory as outlined above. Each of U.S. 2001/0025298 and U.S. 2001/0040924 also discloses in more detail how a representative region of an object is derived (polygon approximation algorithm) and how representative points in a frame are associated with representative points in another frame (correspondence determination algorithm).
In Figure Trajectory, the function approximation, known as temporal interpolation algorithm, involves extending the interval of an approximation function until an extraction error threshold (EET) is greater than a predetermined threshold.
In the algorithm, the interpolation is initialised with the first two points. The interpolation is then widened to add one point at a time until the EET of the interpolation becomes larger than a predefined threshold. At that point a new interpolation interval is started. This procedure is iterated until all points have been processed. When the interpolation model matches the variable values well, this algorithm results in a small number of long interpolation intervals. Conversely, when the match is poor, a large number of short interpolation intervals will result.
To describe the algorithm formally, define the following notation. Let d (>0) be the number of variables (dimension) and denote the value of the j-th variable at time t by vt(j) (j=1,2, . . . ,d). A series of time points is denoted by ti(i=0,1, . . . ). For a (candidate) interpolation interval, the starting time (ending time) is denoted by tSTART (tEND). Let ftSTARTtEND(j)(t)(j=1,2, . . . ,d) be a candidate of the j-th variable interpolation function. This can be calculated using least squares. In this case, ftSTARTtEND(t) is calculated to minimize
      ∑          i      =      START        END    ⁢          ⁢                                                  v                          t              i                                      (              j              )                                -                                    f                                                t                  START                                ,                                  t                  END                                                            (                j                )                                      ⁡                          (                              t                i                            )                                                  2        .  where the interpolation function is a first or second order polynomial. To evaluate the derived candidate function, let Tj to be a threshold for the j-th variable and define error e(j)(j=1,2, . . . ,d) as
      e          (      j      )        =            max              START        ≤        i        ≤        END              ⁢                                                v                          t              i                                      (              j              )                                -                                    f                                                t                  START                                ,                                  t                  END                                                            (                j                )                                      ⁡                          (                              t                i                            )                                                  .      e(j) is the maximum approximation error of the j-th variable in the range tSTART≦ti≦tEND. If e(j)<Tj holds for all j, ftatb(j)(t) is an acceptable candidate. However, another interpolation function in wider interval may be found. To try to test this possibility, the interval is widened by incrementing END and a new function is derived and tested. This procedure is repeated until a newly derived function meets the acceptance condition. Then, the last candidate function is accepted.
A similar technique is disclosed in U.S. 2001/0040924.