1. Field of the Invention
The present invention generally relates to the field of image and video processing, more particularly relates to a method and device for detecting in real time, based on stereo vision, whether there is a gathering of objects (hereinafter also called an “object gathering” for short sometimes) in a target scene.
2. Description of the Related Art
In the field of video monitoring, the analysis on the stream of people in a public area is one of the important research directions, and has a very wide application prospect. How to more efficiently conduct management with respect to the high-density stream of people so as to avoid an accident is a social problem attracting attention from the public. One of the important aspects of the analysis on the stream of people in the public area is the real-time detection and early warning of the gathering of people in order to avoid an accident such as a stampede. For example, generally speaking, if there is a medium or large-scale gathering of people in a security sensitive area such as a public square, that means an abnormal event may occur, and needs to be promptly reported to the security guards, etc.
However, there are still many challenges to achieving the real-time and accurate detection of the gathering of people in a real scene. For example, FIGS. 1A and 1B illustrate two examples of the gathering of people in a real scene, respectively. In the drawings, the high-density crowds of people, the overlaps between persons, the irregular lighting conditions, etc., are unfavorable factors which may negatively affect the relevant detection. At present, the conventional methods of detecting the gathering of people based on video vision are mainly divided into the following two classes.
(1) Method Based on Person Detection and Tracking
In this method, the detection of the gathering of people is realized by detecting and tracking individuals. The number of persons is counted according to the detection result, and the states (e.g., standing or moving) of the persons are recognized by a detection and tracking algorithm. As such, this kind of method is usually only suitable for detecting a low-density crowd of people. That is, in a real scene, the complicate background, the overlaps of persons, the lighting conditions, etc., may cause the detection and tracking algorithm to be invalid, so that it is impossible to obtain an accurate result.
(2) Method Based on Low-Level Image Features
In this method, first a background model of a scene is established, and then, the foreground (i.e., persons) of the scene is acquired by utilizing background subtraction. After that, by adopting a regression algorithm whose input may be the features extracted from the foreground such as the number of pixels therein, the length thereof, and the texture therein, it is possible to estimate the number of persons in the foreground. Additionally, in order to distinguish between motion and stillness of the persons, optical flow is usually used for estimation. However, when estimating the number of persons, since this kind of method is sensitive to the influence caused by the complicate background, the overlaps between persons, the perspective projection distort of the camera used, etc., it is difficult to get an accurate result. Moreover, the motion estimation based on optical flow is a very time consuming process; as such, in a case without an additional hardware device for speeding up the relevant calculation, it is difficult to satisfy the demand of timeliness. On the other hand, the accuracy of the motion estimation is also subject to the lighting conditions, the image resolution, the distance to the camera used, etc.