(1) Field of the Invention
The present invention relates to apparatuses which detect a moving object in an image, and particularly relates to an apparatus which detects a moving object in a video sequence including images by performing region extraction to determine a region showing the whole or a part of the moving object such as a person that moves and changes shape in the video sequence, on the basis of motion information on the moving object.
(2) Description of the Related Art
As a method of detecting a moving object such as a person that moves and changes shape or extracting an image region including the moving object, there is a combination of techniques to extract a candidate region showing a target moving object from an image and to apply a previously-provided object model to the extracted candidate region including the target moving object. For example, Japanese Unexamined Patent Application Publication No. 8-214289 (referred to as Patent Reference 1 hereafter) discloses a method whereby a silhouette image of a target moving object such as a person is extracted from an image as a candidate region and a model corresponding to the target moving object is applied to the extracted silhouette image. In the model used here, parts of the target moving object, such as body parts, are parameterized in advance based on knowledge about such a target moving object. With this method, the parameterized model is applied to the image of the target moving object such as a person that moves and changes shape, so that the target moving object can be detected and also the corresponding image region can be extracted.
Moreover, the following method is disclosed by Joshua Tenenbaum, Vin de Silva, and John Langford in “A Global Geometric Framework for Nonlinear Dimensionality Reduction”, Science, VOL290, pp. 2319-2322, 22 Dec., 2000 (referred to as Non-Patent Reference 1 hereafter). Using input images obtained by capturing one fixed object from different viewpoints, a Euclidean distance indicating similarity between the images is calculated based on pixel values of the images. Then, geodesic distance transformation and then dimensionality reduction is sequentially performed on the Euclidean distance. As a result, the images captured from the similar viewpoints can be projected at a short distance from one another on a two-dimensional space. Here, Non-Patent Reference 1 discloses that, as compared to conventional linear dimensionality reduction methods such as Principal Component Analysis (PCA), lower dimensionality can be achieved through the geodesic distance transformation, and that nonlinearly-distributed data can also be processed.
Here, suppose that an “N” number of data pieces are to be processed according to the method disclosed in Non-Patent Reference 1. In this case, the aforementioned geodesic distance transformation and dimensionality reduction need to be performed using a matrix having an “N2” number of elements. On this account, it is known, as the problem, that when the number “N” is large, an enormous amount of calculation is required.
To address this problem, the methods of reducing the amount of calculation are disclosed by Vin de Silva and Joshua B. Tenenbaum in “Global Versus Local Methods in Nonlinear Dimensionality Reduction”, Neural Information Processing Systems 15, 705-712, 2002 (referred to as Non-Patent Reference 2) and by Vin de Silva and Joshua B. Tenenbaum in “Sparse Multidimensional Scaling using Landmark Points”, Technical Report, Stanford University, June 2004 (referred to as Non-Patent Reference 3). To be more specific, a smaller number of landmark points than the number of data points are selected from the data points, and the geodesic distance transformation and dimensionality reduction are performed using a matrix generated using the selected landmark points.