The present invention relates to a human tracking device and related techniques which are incorporated into a monitoring system that monitors intruders.
Recently, studies for treating a human image contained in a camera image by image-processing have been popularly made. Some of the studies concern a method for detecting moving objects using time-lapse of sequential camera images. This method has insufficient performance to track humans.
Studies, practically applicable to tracking humans, are disclosed in the following two published papers.
The first paper, published in the Institute of Electronics, Information and Communication Engineers papers, PRMU99 No. 67, pp. 23-30 by Haga et al, concerns the combination of time-lapse of camera images with template matching. This technique relates to an intruder monitoring system which automatically finds an intruder and tracks him (or her) while zooming and panning. This monitoring system has two tracking algorithms, i.e., template matching and interframe difference. Errors in the tracking point obtained by template matching are corrected using the interframe differences, thereby realizing stable tracking.
However, in template matching, this technique has the drawback that the tracking tends to fail when a human changes his (or her) posture. This problem arises because a template region is rectangulary defined and matching is carried out using information within the entire template.
In the second technique, published in the Institute of Electronics, Information and Communication Engineers papers, PRMU99 No. 119, pp. 25-32 by Takashima et al, a human is represented in a simple way with three blob models (near-spherical blocks represented with the position on the coordinate and the similarity of colors) so as to track him (or her). This concept of the blob models is based on PFinder developed by MIT Media Lab.
According to PFinder, when a human is situated near the camera and therefore is represented by a relatively large image in a camera image, the human can be stably tracked. However, when the human is situated far from the camera and therefore is represented by a small image, the human image is difficult to distinguish from the noise because the small image of the human is still treated as a three-blob model. As a result, the likelihood of tracking failure is increased. Moreover, a non-human object may possibly be regarded as a human. Furthermore, if false modeling is performed at the initial step of processing, this processing is likely to be inconsistent, which may result in the failure of successive processing.
The present invention is devised in view of the above-described problems. The present invention has an object of providing a human tracking device which is capable of stably tracking a human, independently of the distance between the human and the camera, and related techniques thereof.
According to the first aspect of the present invention, a camera image is divided into at least a human region and a background region. It is judged whether or not it is possible to divide the human region into a plurality of blob models corresponding to parts of a human body. When the result of the judgment is YES, a plurality of human blob models are produced based on the human region. When the result of the judgement is NO, a single human blob model is produced based on the human region. Human tracking is performed based on the resulting human blob models.
With this structure, the human tracking device has greater resistance disturbing factors such as noise, thereby improving stability. Although only rough tracking is possible with a single human blob model, such tracking is considered sufficient for a human appearing in a distant region of the image because it is generally considered that the threat posed by a human at a great distance from an object or area being protected by human tracking is likely to be small. On the other hand, a human in a close-by region of the image, and is therefore represented by a large image, poses a much greater potential threat. The larger image enables the much more accurate multiple-blob tracking.
According to the second aspect of the present invention, the plurality of blob models are three blob models, that is, those of the head, the trunk and the legs.
With this structure, the human can be represented in a simple way with three blob models of the head, the trunk and the legs, whereby stable tracking can be performed. Moreover, only from the positional relationship of these three human blob models, the posture of the human such as standing, sitting, lying and the like can be derived in a simple way.
According to the third aspect of the present invention, a divisional condition judgment means gives the result of the judgment with reference to the distance information of the human region.
With this structure, divisional condition judgments can be appropriately carried out using the distance information.
According to the fourth aspect of the present invention, the divisional condition judgment means gives the result of the judgment with reference to the size of the human region.
With this structure, a human image can be flexibly treated with reference to the size of the human region. For example, when a human having a large physique appears in a rather distant region of the image, a plurality of human blob models are produced, whereas when a small baby appears in a close-by region of the image, a single blob model is produced.
According to the fifth aspect of the present invention, a plurality of background blob models are produced based on the background region.
With this structure, blob models are also applied to the background region, thereby making it possible to treat the background in a simpler way.
According to the sixth aspect of the present invention, in addition to the fifth aspect of the present invention, a region division means obtains a minimum value of similarity between the pixel and the background blob models. When this minimum value is above a threshold value, the pixel is judged not to correspond to the background region.
With this structure, the comparison with the minimum value enables the ability to appropriately distinguish a pixel which does not correspond to the background region (that is, one which possibly, corresponds to a human region).
According to the seventh aspect of the present invention, in addition to the fifth aspect of the present invention, the background blob model is expressed to include X-Y coordinate average values and RGB average values of the region.
With this structure, blob models which faithfully and concisely reflect the features of the background region can be obtained.