There is an ever increasing demand for automatic, efficient and reliable means for tracking and counting entities, particularly people, which enter or exit a specified region. Shopping malls and supermarkets may utilize this information to examine hourly people traffic patterns and distribution. This examination allows for an optimization of labor scheduling, as well as a determination of the effectiveness of promotional events and store displays. In addition, the traffic and distribution information is very important for security purposes, as it assists in assigning an accurate number of security workers to key areas, as well as in designing efficient evacuation plans.
The existing approaches for tracking and counting people may be classified into three broad categories: systems using contact-type counters, such as, for example, turnstiles at a gate; systems using sensors, such as, for example, infrared beams and heat sensors; and vision-based systems using cameras.
Systems using contact-type counters, such as turnstiles, can count people only one at a time. Turnstiles obstruct the passageways and cause congestion if there is high-density traffic. In addition, they have the limitation of possible undercounting, since it is possible for two people to pass through a turnstile in a single rotation of the bar.
Systems using infrared beams or heat sensors do not block passageways and do not affect the passing people to the extent that contact-type counters do. However, they suffer from the same limitation of undercounting, since it is difficult for these systems to successfully resolve multiple people. For example, when an infrared beam is interrupted, multiple people may be entering the region simultaneously. Counting systems have been introduced that use thermal images obtained by multi-infrared sensors, see, for example, K. Hashimoto et al., “People Count System Using Multi-Sensing Application,” Int'l Conf. on Solid-State Sensors and Actuators, Vol. 2, pp. 1291-1294, June 1997. People are detected by taking the difference between the measured output and the mean output for the floor. The sensing system is valid if the temperature difference is more than 0.4° C. However, errors may be caused in these thermal image systems simply by large movements of person's arms or legs.
Image-based systems have been introduced as an alternative to the systems mentioned above. Earlier attempts at image-based systems were successful for situations having scarce people traffic, but encountered limitations when the traffic density became high or when a large number of image mergers occurred, as it was difficult to resolve groups of people.
In order to deal with multiple people in the specified region, a method using an overhead stereo camera has been introduced, see, for example, K. Terada et al., “A Method of Counting the Passing People by Using the Stereo Images,” Proc. of IEEE Int'l Conf. on Image Proc., Vol. 2, pp. 338-42, October 1999. Two measurement lines are set on the floor to detect the direction of the movement. Space-time images at the measurement lines are generated for images, and template matching is applied to these images in order to find the corresponding image coordinates of the same point, thereby obtaining a 3D location of the point. The number of people is determined by counting the number of groups of points in the space-time image. The method performs adequately with multiple people in the camera view as long as people move separately. However, it does not perform adequately with groups of people moving together or with images of people merging into a single group of points.
Additional approaches using an overhead stereo camera have also been introduced; see, for example, D. Beymer, “Person Counting Using Stereo,” Proc. of Workshop on Hununi Motion, pp. 127-133, December 2000. After applying real-time stereo and 3D reconstruction, the scene is segmented by selecting stereo pixels falling in a 3D volume of interest (VOI). These pixels are then remapped to an orthographic view termed an occupancy map and people in this map are tracked using a Gaussian mixture model and Kalman filtering. By selecting the VOI, only heads and torsos of people are tracked, which avoids the counting of shopping carts and small children, for example. Although the system has the advantage of estimating object heights, this requires a calibrated stereo camera head and eliminates the flexibility of using a single ordinary camera. Moreover, the calibration of the stereo head requires some expert intervention and can make the system installation cumbersome.
Cameras with non-overlapping fields of view have also been used; see, for example, V. Kettnaker et al., “Counting People from Multiple Cameras,” IEEE Int'l Conf. on Multimedia Computing and Systems, Vol. 2, pp. 267-271, June 1999. The observations of people by different cameras are linked so that they show the same person, and this is achieved by combining visual appearance matching with mutual content constraint between cameras. However, the system assumes that the floor topology is known, that people walk with a steady speed, and that there is only one person per observation interval. Two people walking together can cause problems for the system, since they cannot be resolved.
A network of cameras may also be utilized in the tracking and counting of people, see, for example, D. B. Yang et al., “Counting People in Crowds with a Real-Time Network of Simple Image Sensors,” Proc. of IEEE Int'l Conf. on Computer Vision, pp. 122-129, October 2003, where the projection of a visual hull, which is a set of polygons, is computed from silhouettes of foreground objects. Upper and lower bounds are projected on the number of objects in each polygon. These bounds are updated as objects move and their history is recorded in a tree. While the method provides an idea about the number of people in the scene and their possible locations, it does not track the people individually.
A people counting system using a single camera has been introduced, which tracks people by analyzing their HSI histogram and uses a box-based corner coordinate checking process to manage image mergers and splits, see, for example, T-H. Chen et al., “An Automatic Bi-Directional Passing-People Counting Method Based on Color Image Processing,” IEEE Int'l Carnahan Conf. on Security Technology, pp. 200-207, October 2003. Two virtual base lines are utilized to determine the direction of the movement of the people. Thus, there is a need to track people from first line to the second and these lines cannot be too close in the proposed scheme. However, image mergers and splits that occur on or around the virtual lines can cause tracking and counting difficulties.
Although using stereo or multiple cameras can provide additional information about object heights and the structure of the environment, it does not provide a solution to the people counting problems that arise using only a single camera.