Conventionally, there is a method and device which automatically tracks the motion of a moving object, particularly motion of a human body, on the basis of image data, and which is used, for example, in the music field and the sports field for ability development based on the analysis and evaluation of motion and for various purposes. For example, a body motion analysis device is known which: extracts human silhouette images from a video image of a dancing examinee input by a computer; detects respective parts of the examinee from the human silhouette images based on color processing; converts the human silhouette images to skeleton images; subjects the skeleton images to Hough transform to approximate the respective parts by lines; and tracks the respective parts in time using a Kalman filter (refer to e.g. Japanese Laid-open Patent Publication 2005-339100).
The above-described body motion analysis device is a device which subjects Hough parameters having been tracked in time to SVD (Singular Value Decomposition) to detect a motion feature of the body motion, and thereafter Fourier-transforms time-series data of the extracted motion feature for frequency analysis so as to extract and evaluate rhythmic elements of the entire body motion of the examinee.
Further, in a pedestrian tracking method and a pedestrian tracking device for automatically tracking a pedestrian based on image data, attempts have been made to increase the accuracy and efficiency to suit uses such as accident prevention, surveillance and so on. More specifically, a more secure and faster pedestrian tracking method or device without malfunction (mistracking) is required.
Now, various filter technologies are used as means to process images for estimating the motion of a pedestrian, and associating it with the time direction. A filter is a method or device for outputting an estimate of a desired response to an input signal such as image data. A filter used for pedestrian tracking removes noise from the current input signal with added noise, and outputs a future signal value as the estimate of the desired response. Such future estimation using a filter is called filter prediction.
For example, a Kalman filter is widely used in the field of object tracking to perform tracking of moving objects in general as used in the above-described patent document (Japanese Laid-open Patent Publication 2005-339100), and is also applied to the pedestrian tracking.
An outline of tracking using a Kalman filter will be described. A Kalman filter estimates state vector xt from observation vector yt sequentially at each time. Here, the subscript t in yt, xt and later-described Ft indicates a certain time, while a time one step before the time is indicated by t−1. In other words, observation (e.g. capture of time-series images) is made at respective time intervals (steps). The time notation using these subscripts will be similarly used hereinafter. The observation vector yt is a vector in observation space which is mathematically defined by observable time-series data. The state vector xt is a vector in space, called state space, representing the state of a system to essentially determine the observation vector, and is assumed to follow a Gauss-Markov process (refer to e.g. “Applied Kalman Filter” by Toru Katayama, Asakura Publishing Co., 1983).
Further, a Kalman filter assumes linearity and gaussianity in both the system model equation xt=Ftxt−1+Gtvt which characterizes the transition of the state vector in the time direction, and the observation model equation yt=Htxt+wt which characterizes the mapping from the state vector to the observation vector. Here, vt and wt are Gaussian white noises, called plant noise and observation noise, respectively, while Ft, Gt and Ht are matrices, called state transition matrix, driving matrix and observation matrix, respectively. It is seen from the form of these equations that the observation vector yt and the state vector xt have linearity.
The assumption of gaussianity in each of the above-described models corresponds to setting an assumption of Gaussian distribution in the probability distribution of the state vector xt in the state space, namely state probability distribution p(x). In the pedestrian tracking, the state probability distribution deviates in some cases from the Gaussian distribution in situations such as presence of occlusion to cause a pedestrian to be temporarily hidden behind, sudden change in velocity of a tracking target (pedestrian), presence of multiple objects similar to the tracking target, and so on. If a Kalman filter is applied in such cases, it means that the state is estimated using a Gaussian distribution as shown in FIG. 36B although a distribution as shown in FIG. 36A, which is different from a Gaussian distribution, is assumed as an actual state probability distribution. Thus, due to the application limit of a Kalman filter which assumes a Gaussian distribution, it is not possible to estimate the state with sufficient accuracy.
Thus, there has been proposed a tracking method, called CONDENSATION (conditional density propagation), using a Monte Carlo filter which does not assume Gaussianity or linearity (refer to e.g. “Conditional Density Propagation for Visual Tracking” by Michael Isard and Andrew Blake, International Journal on Computer Vision, Vol. 29, pp 5-28(1989)).
If a Monte Carlo filter is used, a state vector at each time is sequentially estimated based on an observation vector, similarly as in the case where a Kalman filter is used. In the estimation using a Monte Carlo filter, a state probability distribution is generated based on the distribution of particles each with a vector pointing to a point in the state space. Thus, a Monte Carlo filter can handle nonlinear and non-Gaussian type models obtained by generalizing the state space model and the system model in the above-described Kalman filter (refer, for example, to “Introduction to Time Series Analysis” by Genshiro Kitagawa, Iwanami Publishing Company, 2005).
Thus, it is considered that the CONDENSATION can achieve a highly accurate probabilistic state estimation, namely tracking with less malfunction, even in a situation of presence of occlusion and sudden velocity change where conventional methods e.g. assuming Gaussianity may fail.
(Outline of Monte Carlo Filter)
Here, an outline of a Monte Carlo filter will be described. The system model and the observation model in a Monte Carlo filter are expressed by the following equations (1) and (2):
System Model:xt=F(xt−1,vt)  (1)
Observation Model:yt=H(xt,wt)  (2)
The state probability distribution p(xt) of the state vector xt in the state space can be expressed by a set of N particles {st(n), n=1, . . . , N} as in the following equations (3) and (4), where st(n) is a vector which an n-th particle has and which points to a point in the state space X, while δ(x) is a delta function:
                              p          ⁡                      (                          x              t                        )                          ≅                              1            N                    ⁢                                    ∑                              n                =                1                            N                        ⁢                          δ              ⁡                              (                                                      x                    t                                    -                                      s                    t                                          (                      n                      )                                                                      )                                                                        (        3        )                                          δ          ⁡                      (            x            )                          =                  {                                                                      +                  ∞                                                                              x                  =                  0                                                                                    0                                                              x                  ≠                  0                                                                                        (        4        )            
The state probability distribution of a Monte Carlo filter is represented by a discrete density of particles. For example, in the case where the distribution shown in FIG. 37A is a true probability distribution, the probability distribution in a Monte Carlo filter is expressed by the discrete density of particles as shown in FIG. 37B. Thus, a higher number of particles leads to a more accurate representation of the state probability distribution. Any state probability distribution can be represented by such a representation using particles.
(State Estimation Algorithm Using Monte Carlo Filter)
Next, a state estimation algorithm using the above-described Monte Carlo filter will be described. FIG. 38 shows a process performed at time steps for N particles s(1),s(2), . . . ,s(N). In this Figure, the horizontal axis is a time axis while the vertical axis represents state space (represented by one dimension). The size of each particle shown by a black circle or a white dashed circle indicates the likelihood (likeliness or possibility of occurrence) of the state. As shown in this Figure, the process using a Monte Carlo filter is a repetition of a three-step process of prediction, measurement and resampling (resetting).
Based on the above-described repetition of the three-step process, the state probability distribution p(xt) at time t is sequentially obtained by using observed data and state probability distribution p(xt−1) at previous time t−1, and the state probability distribution at each time is sequentially estimated. Further, the state probability distribution is flexibly determined without assuming Gaussianity. Thus, the state probability distribution is corrected by the observed data, and the next state probability distribution is obtained by using the corrected state probability distribution, so that the trajectory of a particle in the state space representing a tracking result becomes more true.
For the respective particles (n=1, . . . , N), the prediction step predicts the following state s′t(n) according to the process probability density p(xt|xt−1=st−1(n)) (hereafter refer to the above-described “Introduction to Time Series Analysis” by Genshiro Kitagawa).
For the respective particles, the measurement step calculates the likelihood πt(n) in the predicted state according to the observation probability density p(yt|xt). In other words, this step obtains the similarity (likelihood) between the state of a tracking target model corresponding to the respective particles and the observed data (image of the tracking target) by making a comparison based on properly set comparison method. Here, yt is an observation vector (observed data) at time t.
The resampling step repeats the following process (i), (ii) and (iii) N times according to the number of particles N so as to sample a set of particles {st(n), n=1, . . . , N} at time t. In other words, this step redistributes (resets) the N particles in the state space by using the likelihood of each particle representing the predicted state to allocate a larger number of particles at locations of particles with a higher likelihood, and allocate a smaller number, or none, of particles at locations of particles with a lower likelihood, so as to determine the state probability distribution at time t which reflects the correction by the observed data.
(i) Generate a random number ut(n) ∈[0,1] following uniform distribution;
(ii) Obtain a natural number i satisfying the following inequality and equation;
                                                        1              C                        ⁢                                          ∑                                  l                  =                  1                                                  i                  -                  1                                            ⁢                              π                t                                  (                  l                  )                                                              <                      u            t                          (              n              )                                ≤                                    1              C                        ⁢                                          ∑                                  l                  =                  1                                i                            ⁢                              π                                  t                  ⁢                                                                                                          (                  l                  )                                                                    ⁢                                  ⁢        where                            (        5        )                                C        =                                            ∑                              l                =                1                            N                        ⁢                                                            π                  t                                      (                    l                    )                                                  ⁢                                                                  (                iii                )                            ⁢                                                          ⁢                              s                t                                  (                  n                  )                                                              =                                    s              t                              ′                ⁡                                  (                  i                  )                                                      ⁢                                                  ⁢            is            ⁢                                                  ⁢                          set              .                                                          (        6        )            
The state probability distribution p(xt) of particles at time t is obtained by the above-described three-step process of prediction, measurement and resampling (resetting). When using a Monte Carlo filter, it is necessary, depending on applications, to properly set conditions such as: how to form a state space X, i.e. a model of target and so on; how to make a state transition in the prediction step such as, inter alia, constraint conditions for the transition; what to use as a calculation method, i.e. comparison method, of the likelihood of particles in the measurement step; and so on.
Next, a contour tracking to track a target using the contour(s) of the target will be described as an example of using CONDENSATION. The contour tracking method models the contour of the tracking target by B-spline curve, and defines a space composed e.g. of the coordinate values of control points of the spline curve as a state space. The motion (transition) of a state vector pointing to a point in the state space is estimated (predicted) using a Monte Carlo filter. In other words, a point in the state space is in one-to-one correspondence with a state of the contour, so that in the state space, the current state moves, i.e. transitions, from a point (state) to another point (state) as time passes. The transition is considered to be probabilistically achieved under certain constraint conditions.
When predicting the state transition, it is possible to increase the accuracy of prediction by restricting transitionable states in advance, i.e. by constraining the state transition. In the conventional contour tracking using CONDENSATION, the state transition is constrained by pre-learning using principal component analysis. In the following, the state space, state transition and calculation of likelihood in the contour tracking using CONDENSATION will be shown.
(State Space)
Approximate a contour of a tracking target by B-spline curve, defining the positions and velocities of control points of the B-spline curve as a state space X.
(State Transition)
Use principal component analysis to pre-learn supervised data. When the state space has M dimensions, determine state transition (st′−st−1) based on a linear combination of the first principal component vector to the L-th principal component vector (L<M) to reduce the degree of freedom from M to L. This constrains the state transition to allow the state to scatter in the direction of the localized supervised data, i.e. to follow the characteristics of the supervised data.
(Calculation of Likelihood)
The likelihood π of particles is calculated by comparing the B-spline curve with input image according to the following procedure. First, set base points consisting of K points on the predicted B-spline curve, and set lines extending therefrom in the normal direction and having a length of μ. Next, detect, on each of these lines, an edge (image contour of target portion) which is a feature point of the image, and assume that the distance from the k-th base point to the detected edge is δk. The likelihood π of particles in the state space representing the state of the contour is calculated using this δk according to the following formula (7):
                    π        ⁢                                  ∝                  exp          ⁢                      {                          -                                                ∑                                      k                    =                    1                                    K                                ⁢                                                      1                                          2                      ⁢                      rK                                                        ⁢                                                            (                                              min                        ⁡                                                  (                                                                                    δ                              k                                                        ,                            μ                                                    )                                                                    )                                        2                                                                        }                                              (        7        )            
The above-described contour tracking using CONDENSATION is applied to the tracking of hand palms and leaves, achieving excellent results. Further, a pedestrian tracking method using a framework of CONDENSATION has been proposed (refer to e.g. “A Bayesian Multiple-Blob Tracker” by Isard and MacCormick, IEEE International Conference on Computer Vision, pp 34-41, 2001).
However, the application of the contour tracking using CONDENSATION to pedestrian tracking has the following problem. An object such as a pedestrian considerably changes with time in the direction and magnitude of contour transition. It is generally difficult to properly constrain the state transition for the contour of such object, so that the contour tracking using CONDENSATION is not suitable for pedestrian tracking.
Referring to FIGS. 39A, 39B and 39C, the contour transition of a pedestrian will be described. These Figures show regions of a pedestrian detected by three successive frames in an actual sequence. Between the two pedestrian regions of FIGS. 39A and 39B, there is a significant change in the lower body contour although there is no significant change in the upper body contour. On the other hand, between FIGS. 39B and 39C, there is no significant change in the lower body contour although there is a significant change in the upper body contour. This shows that the contour of a pedestrian changes with time, i.e. transitions in state, in which the changing parts and the magnitude of the change also change with time.
Further, as described above, the contour tracking using CONDENSATION constrains the state transition based on principal component analysis. Thus, significant effects of constraint can be obtained when the supervised data in the state space is localized in a certain direction. However, there are various contour transitions of a pedestrian, and the supervised data is scattered in distribution with less tendency. It is impossible in principle to properly constrain the state transition for such state space based on principal component analysis. The constraint of the state transition is for the purpose of increasing the accuracy of predicting the state transition. Thus, if the constrain is not possible, an increase in the accuracy of prediction cannot be expected, making it impossible to achieve tracking without malfunction.
Thus, in order to achieve a robust pedestrian tracking with high stability without malfunction, a feature which allows better stability of the direction and magnitude of the state transition needs to be used instead of the method using contour as a feature so as to increase the accuracy of prediction based on the feature.
Further, the use of a Monte Carlo filter eliminates the need for assuming Gaussianity as in the case of using a Kalman filter, making it possible to achieve a more robust tracking, so that various tracking methods using this framework have been proposed. However, there has not been proposed a practical method suitable for pedestrian tracking. For example, the pedestrian tracking using CONDENSATION as disclosed in the above-described paper by Isard and MacCormick is used in the case of an image having a large pedestrian region, and performing tracking based on accurate pedestrian models e.g. using three-dimensional information, and is not suitable for practical use in which it is required to be adapted to an image with a small pedestrian region.
Further, in pedestrian tracking for surveillance applications, a grey scale image with a large dynamic range is used, and in addition an infrared camera is considered to be used. Thus, a tracking method which does not use color information is desired in order to enable pedestrian tracking adapted to such situations.