Conventionally, techniques for detecting a moving object such as a pedestrian from a video have been developed. The moving object detection techniques are important techniques that can be applied to the techniques for the purpose of surveillance, the detection of a pedestrian by a moving vehicle, or an intelligent robot.
So far, many techniques have been developed. However, it is difficult to detect a pedestrian because the pedestrian is a multi-joint object and can strike various poses, and there are a variety of visions according to the influence of clothing, light from outside, a complicated background, and so on.
As a technique for detecting a person from static image, a technique is known for detecting a person through combination of Histogram of Oriented Gradient (HOG) feature and Adaboost classifier (for example, refer to Non Patent Literature 1).
FIG. 32 is a diagram for explaining a method described in Non Patent Literature 1. As shown in the part (a) of FIG. 32, a window 12 of a predetermined size is set in an input image 10. The window 12 is divided into a plurality of patches (patch 14a, patch 14b, and so on), and the HOG feature is calculated for each of the patches.
For example, a method for calculating the HOG feature of the patch 14a is described with reference to the part (b) of FIG. 32. When an intensity for each of the pixels in the patch 14a is I (x, y), a spatial gradient for each of the pixels ∇I=[Ix, Iy] is calculated. Moreover, an orientation of spatial intensity gradient for each of the pixels φ=tan−1 (Iy/Ix) is calculated. A relationship between the spatial gradient ∇I and the orientation of spatial intensity gradient φ is shown in the part (c) of FIG. 32. In other words, when a horizontal axis is Ix and a vertical axis is Iy, an angle formed by a vector of the spatial gradient ∇I and the horizontal axis Ix is the orientation of spatial intensity gradient φ. By calculating the frequency of the orientation of spatial intensity gradient φ for each of the pixels in the patch 14a, a histogram shown in the part (d) of FIG. 32 can be obtained. The histogram is a HOG feature. Similarly, the HOG feature of the patch 14b can be obtained as shown in the part (e) of FIG. 32. By calculating the HOG features for all patches within the window 12 and determining them as input of the Adaboost classifier, the Adaboost classifier determines whether or not a person is included within the window 12. By performing raster scan on the window 12 from the top left corner to the bottom right corner of the input image 10 and determining whether or not a person exists in each of the positions, it is possible for a person included in the input image 10 to be detected.
It should be noted that ∇I=[Ix, Iy] can be calculated with a general first derivative operator (Sobel, Roberts, Rosenfeld, and so on). Therefore, a detailed description will be omitted.
The CoHOG feature which is obtained by extension of the HOG feature is also known (for example, refer to Non Patent Literature 2). FIG. 33 is a diagram for explaining the CoHOG feature. As shown in the part (a) of FIG. 33, when paying attention to the window 12 as similarly to the part (a) of FIG. 32, the window 12 in divided into a plurality of patches (patch 14b and so on), and the CoHOG feature is calculated for each of the patches.
For example, a method for calculating the CoHOG feature of the patch 14b is described. The part (b) of FIG. 33 is a diagram in which the patch 14b is expanded. First, an orientation of spatial intensity gradient φ=tan−1 (Iy/Ix) is calculated for each of the pixels within the patch 14b as shown in the part (b) of FIG. 32. Next, the pixel to be focused within the patch 14b is determined as P0. The adjacent pixel on the diagonal bottom left of the pixel P0, the adjacent pixel below the pixel P0, the adjacent pixel on the diagonal bottom right of the pixel P0, and the right adjacent pixel of the pixel P0 are determined as co-occurrence pixels P1, P2, P3, and P4, respectively. Moreover, an orientation of spatial intensity gradient of the pixel P0 is determined as φ0 and an orientation of spatial intensity gradient of the pixel for co-occurrence Pi (i=1 to 4) is determined as φi (i=1 to 4). When the pixel which pays attention to each of the pixels within the patch 14b is P0, a two-dimensional histogram which determines φ0 and φi as a pair of variables is generated. The histogram is a CoHOG feature. The part (c) of FIG. 33 shows an example of a two-dimensional histogram which designates φ0 and φ1 as a pair of variables. The part (d) of FIG. 33 shows an example of a two-dimensional histogram which designates φ0 and φ2 as a pair of variables. The part (e) of FIG. 33 shows an example of a two-dimensional histogram which designates φ0 and φ3 as a pair of variables. In the examples shown in FIG. 33, since there are four combinations of the pixel P0 and the pixel for co-occurrence Pi (i=1 to 4), four CoHOG features can be obtained from the patch 14b. By calculating the four CoHOG features for each of all the patches and inputting them to the classifier such as the Adaboost classifier, it is possible to determine whether or not a person exists within the window 12. It should be noted that the co-occurrence Pi is not limited to the adjacent pixel of the target pixel P0, and any pixel is acceptable as long as it is a pixel having a predetermined positional relationship with the pixel P0. Moreover, the number of pixels for co-occurrence Pi is not limited to four, and the number can be selected where appropriate. A technique using the CoHOG feature is known to have higher accuracy than a technique using the HOG feature.
A HOGHOF feature is known as another feature in which the HOG amount is expanded (for example, refer to Non Patent Literatures 3 and 4). Here, HOF represents a histogram for an optical flow direction. For example, when an optical flow for each of the pixels within the patch 14a shown in the part (a) of FIG. 32 is u=[ux, uy], the optical flow direction ψ can be calculated to be ψ=tan−1(uy/ux). A histogram is generated also for ψ as well as φ, and the HOF feature is calculated. By using the HOGHOF feature which is a combination of the HOG feature and the HOF feature, human action analysis can be performed. As a method for calculating an optical flow, a differential method, template matching method, and so on can be used. Therefore, a detailed description will be omitted.
Moreover, a 3DHOG feature is known as another feature in which the HOG amount is expanded (for example, refer to Non Patent Literature 5). In the 3DHOG feature, a window having a predetermined volume is set in a three-dimensional video in which static images are arranged in a temporal axis direction. The feature of a spatial shape and the feature of a temporal movement feature for each of the pixels within the window are indicated by one vector. By comparing the vector with a normal vector for each of the surfaces of a virtual polyhedron within the window and by casting a vote for the surface having the closest normal vector, a histogram is generated in which each surface of the polyhedron is designated as a bin (class). The histogram is a 3DHOG feature. By using the 3DHOG feature, human action analysis can be performed.