It is common to use computer vision techniques to analyze images of a scene. A frequent requirement is to analyze images that vary dynamically over time. For example, in many applications, it is desired to determine whether an object such as a person has appeared in the scene.
Computer vision analysis of an object generally requires multiple processing stages. First, the object is segmented from the background. Attributes such as the shape, 3D motion and location of the object can then be determined. Finally, the object can be analyzed for the purpose of classification or recognition.
Frequently, background subtraction is usually to perform segmentation, in the case of a fixed camera observing an object appearing in front of a static background. Conventional background subtraction methods are based on per pixel intensity values. Usually, pixel intensities in a live image are subtracted from corresponding pixels in a reference image of the static scene to construct a difference image. The reference image can be acquired ahead of time when it is known that there are no moving objects in the scene. Any pixels with a low value in the difference image are considered to be part of the static background, and pixels with higher values are presumed to be part of the object. For a survey of background subtraction methods, see Toyama et al., “Wallflower: Principles and Practice of Background Maintenance,” Proceedings of the International Conference on Computer Vision, pp. 255–261, 1999.
There are problems with conventional background subtraction techniques. First, the difference image is determined on an individual pixel basis, thus noise or artifacts can give erroneous local results. Connected-component analysis can be used to eliminate small spurious responses, but this takes extra processing. In addition, any portion of the object that has the same color (or intensity in gray-scale images) as portions of the reference image are difficult to detected. In this case, color or intensity gradients at the silhouette boundary or internal shadows may still be visible. This indicates that only those parts of the image with color or intensity gradients (edges) are truly reliable for detecting the object versus the background.
Therefore it is desired to compute the edges for the object in a direct fashion, unlike conventional background subtraction.
Segmentation of the object from the static background can be followed by further analysis. Many prior art computer vision systems use a single camera. It is well known that extracting information such as shape, 3D motion and location from images acquired by a single stationary camera is difficult. As cameras are becoming relatively inexpensive, stereo analysis of multiple camera images will become more prevalent. Stereo analysis provides a more robust measurement of the shape, 3D motion and location of object than is possible with a single camera.
Stereo processing requires a choice of what features to match between images—pixels or features such as points or edges. More specifically, edges are usually linked into edge chains as part of the edge detection process before subsequent processing. Stereo matching of edge chains is complicated by the fact that the edge chains for a given object may be computed differently in different images. For example, a given physical contour for the object, i.e., a silhouette boundary or a surface texture boundary, may be detected as one or more edge chains in each stereo image, with the chains having different termination points in different images, and the chains may connect between the object and the background texture in arbitrary ways in the different images.
Therefore it is desired only to chain edges that correspond to physical contours of the object.
There are a large number of image based systems for controlling operation of elevator doors, see U.S. Patent Application No. 2001/0045327 “Elevator Door Control Device,” filed on Apr. 4, 2001. However, those systems do not discriminate types of passengers. Also see U.S. Pat. No. 6,339,375 issued to Hirata et al. on Jan. 15, 2002 “Image monitoring apparatus and image monitoring method,” that describes a system for detecting whether a passenger is in an elevator doorway. The doorway is determined by pattern matching to static, straight horizontal and vertical lines that form the doorway. The two-dimensional line information is reduced to one-dimensional information. Obviously, that method is unsuited for detecting irregularly shaped moving objects.
Japanese Patent Publication No. 11–268879 describes an elevator control system where two cameras are mounted on a ceiling, and acquired images are analyzed to discriminate types of waiting passengers based on top planar shapes and heights of the passengers. That system requires manual activation by the passenger by pushing a call button and ceiling mounted cameras.
U.S. Pat. No. 6,386,325 issued to Fujita on May 14, 2002 describes an “Elevator system with a hall scanner for distinguishing between standing and sitting elevator passengers.” That system also requires manual activation by having the passenger push a call button and is only able to monitor passengers who have operated the hall call button. This system uses conventional background subtraction to generate a 2D difference image. The 2D difference image is compared with prestored models of wheelchair configuration. The only configurations shown are a direct frontal and side view of a wheelchair user.
There are major problems with that system. First, because the system uses conventional background subtraction, it has the inherent problems of being able to generate a useful difference image, as described above. Second, it is unlikely that any view acquired by the cameras will ever resemble the configurations show. In fact, the number of different configurations of ambulatory persons and wheelchair users are innumerable. Fujita does not disclose how the basic configuration patterns are matched to any arbitrary view. It is well known in vision systems that pattern matching is an extremely difficult problem, with solutions only in constrained configurations. Third, the background subtraction only supplied a 2D model, no depth information, or movement within the object can be determined.
Therefore it is desired to provide a wheelchair detection system, which does not use background subtraction, and which does not require prestored models, and which is based on 3D information.