1. Field of the Invention
The present invention is a method for detecting events in an imaged scene by spatial and temporal analysis of the occlusion of linear features in the scene.
2. Background of the Invention
Automatic detection of objects in a scene is an important problem with many applications. For example, in an outdoor or indoor environment, devices for detecting vehicles and people are used as tools for detecting events like a vehicle arriving or departing. These devices are strategically installed in locations in the scene where the objects of interest are expected to pass by. Special sensors are used by some methods.
U.S. Pat. No. 6,538,579 by K. Yoshikawa and S. Sunahara disclosed a method and apparatus for detecting moving vehicles. An array of distance sensors, arranged to face a single direction, are installed above a detection line. Each sensor detects the vehicle at a single location and the array of sensors are linked to a detection processor which detects events like vehicle entry and vehicle exit.
U.S. Pat. No. 4,392,119 by R. Price, et al., disclosed an apparatus for detecting the arrival of a vehicle in a drive-in window. A loop detector, installed on the vertical wall below the drive-in window, senses the metallic sides of a present vehicle. The detector sends an electrical signal whenever a vehicle is present in front of the window, enabling the apparatus to detect events like a vehicle arriving or a vehicle waiting for a certain length of time.
U.S. Pat. No. 5,173,692 by B. Shapiro and Y. Rosenstock, disclosed the use of two overhead ultrasonic detectors attached to a computer to count and classify vehicles. The two detectors, installed along a vehicle's path of travel, can be used to operate devices like traffic lights and gates.
The popularity of image-capturing devices attached to computers has made it possible to use image analysis to detect objects in a scene. One approach is to model the transitions between the background scene and foreground objects. U.S. Pat. No. 5,465,115 by G. Conrad, B. Denenberg and G. Kramerich, disclosed a method for counting people walking through a restricted passageway, such as a door. Using the image taken from an overhead video camera, a narrow image window perpendicular to the traffic flow is analyzed. This narrow image is divided into short image segments, called “gates”, which are points of detection. The thresholded differences of these gate images across consecutive frames are used to detect persons crossing the gate. Since the gates' widths are small, one passing person would occlude several contiguous gates. The method depends heavily on the speed of the moving objects; they have to be fast enough to record significant frame-to-frame differences. The event of one person passing can even register several significant differences. Certainly, the method would not work if a person stops in the location of the window.
The more popular approach using image analysis is to create a model of the image of the background scene and classify regions in a scene's image as either background or foreground, which is the object region. Many computer vision systems use a background image model to detect and track objects in an imaged scene. The idea is a system that maintains a model of the background, which is the scene without any objects in it. This model can be an image of the background or some statistics of its sub-images. Whenever a small region of an input image is significantly different from the corresponding region in the background model, the location of that region is tagged as foreground, which means part of an object region. Spatial and temporal post-processing techniques refine the localization of the object region.
In the work of S. McKenna, S. Jabri, Z. Duric, and H. Wechsler, “Tracking interacting people”, Proceedings of 4th IEEE Int'l Conference on Automatic Face and Gesture Recognition, pp. 348-353, March 2000, each color pixel in the background image is modeled as separate Red, Green, and Blue Gaussians. Each has a mean and variance which are continuously adapted to account for slow changes in outdoor illumination. The corresponding pixel in an input image is considered foreground is its value is a few standard deviations away from the mean. Gaussian models for chromaticity and gradient are added to classify object shadows as part of the background.
Instead of maintaining the distribution parameters for each pixel, K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and Practice of Background Maintenance”, Proceedings of 7th IEEE Int'l Conference on Computer Vision, pp. 255-261, September 1999, maintains the past values of the pixel and uses a linear function to predict the value of the background pixel in the next frame. If the pixel in the next frame deviates significantly from its predicted value then it is considered foreground.
The gradual and sometimes sudden changes in scene illumination, plus the problem of slow-moving objects requires a more complex model for each background pixel value. C. Stauffer and W. Grimson, “Learning Patterns of Activity Using Real-Time Tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 747-757, Vol. 22, August 2000, proposed multiple, adaptive Gaussians to model to each pixel. Multiple Gaussians are maintained for the background, and pixels that do not match any one of these Gaussians are considered foreground.
In the work of I. Haritaoglu, D. Harwood and L. Davis, “W4: Real-Time Surveillance of People and Their Activities”, IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 809-830, Vol. 22, August 2000, each background pixel is represented by three values: its minimum, its maximum and maximum frame-to-frame intensity difference during a training period. This initial training period of about 20 to 40 seconds is when the background model is initially learned, which takes place even when objects are moving around the scene. This method is designed for scenes where an object cannot stay in one place very long. Otherwise, it will be included as part of the background. One such situation is when a car stops in the middle of the scene and stays there for minutes.
U.S. Pat. No. 5,748,775 by M. Tsuchikawa, A. Sato, A. Tomono and K. Ishii, disclosed a method for continuous reconstruction of the background image for background subtraction. A temporal histogram is maintained for each pixel and a statistical method is applied to determine whether the temporal changes are due to abrupt or gradual illumination changes. Background pixels are updated using statistics of the past values, computed from the temporal histogram.
Instead of working with the whole image of the background, U.S. Pat. No. 6,546,115 by W. Ito, H. Yamada and H. Ueda, disclosed a method wherein the image view is divided into several sub-views, each of which has a maintained background image of the sub-view. The system is used for detecting objects entering the scene. Each sub-view is processed independently, with the system detecting movement in the sub-view by computing the intensity differences between the input sub-image and the corresponding background sub-image. If an object is detected in a sub-view then the corresponding background sub-image is not update, otherwise it is updated.
Dividing the background image into several sub-images is also employed by U.S. Pat. No. 5,684,898 by M. Brady and D. Cerny, which disclosed a method for distinguishing foreground from background pixels in an input image for background subtraction. A weighting function takes the difference between pixel intensities of the background sub-image and that of the corresponding pixels in the input sub-image, and the weights are used to classify pixels in the input sub-image. If a background sub-image is not significantly different from the current input sub-images, it is updated. The method would work well if the objects are constantly moving in the scene.
The main idea behind the image-based methods of background subtraction is that a point location in the scene is occluded by a foreground object if its pixel value in the input frame is significantly different from the expected value in the background model. The model is initially learned from an image sequence and continuously updated in order to adapt to changes in scene illumination.
These methods suffer from problems resulting in the corruption of the background model. A requirement of these methods is that objects should be constantly moving so as not to occlude parts of the background for long period of time. When an object stays too long in one location then the history of past values or their distribution becomes significantly different from that of the true background. Furthermore, when an object is in one location and the ambient illumination changes drastically then the background model in that location could be permanently corrupted even if the object moves later and exposes the background.
The root of these problems lies in the difficulty of modeling the value of a background pixel over time. The value of a pixel is not stable over a wide range of imaging conditions, especially outdoors where intensity values can gradually or drastically change and have a wide range across different times of the day and varying weather conditions. Image-capturing devices, particularly those that employ auto-exposure, also contribute to the variability of the pixel value.
The image gradients, or image edges, however are much more stable under varying imaging conditions. The presence of an edge in the scene causes significant differences between the values of adjacent pixels in the image location of the edge. In a particular point in the image, the pixel values could vary, but the presence of edges can be easily determined. If a point in the scene is occluded, the pixel value may or may not have changed. When changes are observed, it could be due to an occluding object or other factors like weather, auto-exposure, and time of the day. On the other hand, an edge point in the scene would almost certainly disappear if occluded by an object.
The idea of detecting the occlusion of edge points is extended to linear features. A linear feature is a group of edge points in a continuous and contiguous arrangement. Lines are found in many indoor and outdoor scenes. Roads, sidewalks, buildings, grocery aisles, doors, windows, fences, and outdoor fixtures have lines in them. One can observe that most of these lines are fixed and are visible across varying lighting and weather conditions. Many of these fixed lines can be seen as occluded by people or vehicles when viewed from certain angles. By placing image-capturing devices in places such that it can capture objects occluding linear features, the fixed linear features can serve as detection points—locations where objects can be detected. Multiple linear features in the scene allows “occlusion events” to be defined. For example, if two parts of a line of a road curb are occluded one after another, then this could be an event of a car passing by. Another example is an indoor event of one person going through the door if the lines on the doorframe and certain lines on the floor are occluded in succession.
For some applications it is sufficient to have a few detection points in the scene in order to describe events and have a system to automatically detect them. With a few detection points, it is not necessary to process the entire image unlike many of the previous image-based methods. To determine line occlusions, the system needs to process only the pixels along the location of the linear features, making it computationally inexpensive.
The biggest advantage of a system for detecting line occlusions in a scene is that detection can be done per image, independent of the information found in previous images of the scene. Given a fixed image-capturing device and the image location of the scene's fixed linear features, no prior information is necessary to compute occlusions. This is in contrast to the image-base techniques described in the prior art where information from previous images is used to build and update the background image model.
The present invention is described in the following section and illustrated by two exemplary embodiments together with their accompanying drawings.