1. Field of the Invention
The present invention generally relates to the field of object detection, tracking, and counting. In specific, the present invention is a computer-implemented detection and tracking system and process for detecting and tracking human objects of interest that appear in camera images taken, for example, at an entrance or entrances to a facility, as well as counting the number of human objects of interest entering or exiting the facility for a given time period.
2. Related Prior Art
Traditionally, various methods for detecting and counting the passing of an object have been proposed. U.S. Pat. No. 7,161,482 describes an integrated electronic article surveillance (EAS) and people counting system. The EAS component establishes an interrogatory zone by an antenna positioned adjacent to the interrogation zone at an exit point of a protected area. The people counting component includes one people detection device to detect the passage of people through an associated passageway and provide a people detection signal, and another people detection device placed at a predefined distance from the first device and configured to detect another people detection signal. The two signals are then processed into an output representative of a direction of travel in response to the signals.
Basically, there are two classes of systems employing video images for locating and tracking human objects of interest. One class uses monocular video streams or image sequences to extract, recognize, and track objects of interest [1] [2] [3] [4]. The other class makes use of two or more video sensors to derive range or height maps from multiple intensity images and uses the range or height maps as a major data source [5][6] [7].
In monocular systems, objects of interest are detected and tracked by applying background differencing [1], or by adaptive template matching [4], or by contour tracking [2][3]. The major problem with approaches using background differencing is the presence of background clutters, which negatively affect robustness and reliability of the system performance. Another problem is that the background updating rate is hard to adjust in real applications. The problems with approaches using adaptive template matching are: (1) object detections tend to drift from true locations of the objects, or get fixed to strong features in the background; and (2) the detections are prone to occlusion. Approaches using the contour tracking suffer from difficulty in overcoming degradation by intensity gradients in the background near contours of the objects. In addition, all the previously mentioned methods are susceptible to changes in lighting conditions, shadows, and sunlight.
In stereo or multi-sensor systems, intensity images taken by sensors are converted to range or height maps, and the conversion is not affected by adverse factors such as lighting condition changes, strong shadow, or sunlight [5][6][7]. Therefore, performances of stereo systems are still very robust and reliable in the presence of adverse factors such as hostile lighting conditions. In addition, it is easier to use range or height information for segmenting, detecting, and tracking objects than to use intensity information.
Most state-of-the-art stereo systems use range background differencing to detect objects of interest. Range background differencing suffers from the same problems such as background clutter, as the monocular background differencing approaches, and presents difficulty in differentiating between multiple closely positioned objects.
U.S. Pat. No. 6,771,818 describes a system and process of identifying and locating people and objects of interest in a scene by selectively clustering blobs to generate “candidate blob clusters” within the scene and comparing the blob clusters to a model representing the people or objects of interest. The comparison of candidate blob clusters to the model identifies the blob clusters that is the closest match or matches to the model. Sequential live depth images may be captured and analyzed in real-time to provide for continuous identification and location of people or objects as a function of time.
U.S. Pat. Nos. 6,952,496 and 7,092,566 are directed to a system and process employing color images, color histograms, techniques for compensating variations, and a sum of match qualities approach to best identify each of a group of people and objects in the image of a scene. An image is segmented to extract regions which likely correspond to people and objects of interest and a histogram is computed for each of the extracted regions. The histogram is compared with pre-computed model histograms and is designated as corresponding to a personor object if the degree of similarity exceeds a prescribed threshold. The designated histogram can also be stored as an additional model histogram.
U.S. Pat. No. 7,176,441 describes a counting system for counting the number of persons passing a monitor line set in the width direction of a path. A laser is installed for irradiating the monitor line with a slit ray and an image capturing device is deployed for photographing an area including the monitor line. The number of passing persons is counted on the basis of one dimensional data generated from an image obtained from the photographing when the slit ray is interrupted on the monitor line when a person passes the monitor line.
Despite all the prior art in this field, no invention has developed a technology that enables unobtrusive detection and tracking of moving human objects, requiring low budget and maintenance while providing precise traffic counting results with the ability to distinguish between incoming and outgoing traffic, moving and static objects, and between objects of different heights. Thus, it is a primary objective of this invention to provide an unobtrusive traffic detection, tracking, and counting system that involves low cost, easy and low maintenance, high-speed processing, and capable of providing time-stamped results that can be further analyzed.