It is well known, e.g., as evidenced by products offered by various companies, e.g., PacketVideo that video can be optimized for transmission, compressed in a manner compatible with an object-based video compression algorithm, e.g., MPEG-4, and this can be done from a wireless networking point-of-view. Existing systems, e.g., implementing Internet video security/monitoring services, such as Xanboo, can provide neither object-based compression nor automatic video recognition capabilities. Current security systems utilize infrared or ultrasound motion sensors to detect intruders. These sensors are often triggered by pets or wind, or the like, resulting in high false alarm rates. In existing systems the video cameras will often not capture the source of an intruder alarm, leaving the user not knowing if the alarm is a true false alarm (and if it is, what caused it) or if the alarm was caused by an intruder who is not in the camera field of view. Changes in the background due to lighting or uninteresting objects often result in additional segmented objects being sent to the object-based video compression module being employed in the prior art systems, which further detracts from the compression ratio or results in a loss in video quality or both. Prior art systems are also deficient in their capabilities of labeling the intruder objects and/or tracking of the intruder's movements.
While the preferred embodiment of the present invention is discussed in the context of a home and/or business security system, the system can also be used for various other applications, including, e.g., a wide variety of video detection/monitoring applications. Opportunities exist, e.g., for video based monitoring systems, e.g., in such areas as streaming web video over both wired and wireless networks. Real-time video streaming, in which the user sees events as they occur with full motion, can be useful in many applications, including ones like that of the present invention, which can result in new and compelling applications. This can especially be true for video streaming to wireless devices such as PDAs or other hand held or otherwise very portable personal computing devices.
Visual recognition/detection/monitoring technology is applicable to a wide variety of applications, e.g., for the infrastructure of webcams, PDA's, cell phone video cameras, and other types of digital video cameras that are now being built and/or developed. The present invention can be used as the basis for a number of applications, including: indoor/outdoor home and business security products and services; employee monitoring and tracking; traffic monitoring; parking lot surveillance; parking assistance at parking facilities; information/advertisement video streams for malls, theatres, parks, restaurants, etc.; computer, video-game, kiosk, television interaction and control (insertion of users into a video or computer environment, e.g., an inexpensive camera attached to or installed in a television settop box could be used to insert users into digital environments transmitted over cable or satellite links and the user could then interact with the environment through various gestures or movements); semi-automatic object segmentation of video for the entertainment industry; video peer-to-peer applications where users can share or swap efficiently compressed video streams; and video conferencing using portable devices; in each of which cases, the existing systems suffer from defects correctable by utilizing the concepts of the present invention.
Current security systems utilize infrared or ultrasound motion sensors to detect intruders. These sensors are often triggered by pets or wind, resulting in high false alarm rates. Existing computer-vision-based video surveillance systems such as Pfinder suffer from a variety of drawbacks, e.g., they can often incorrectly segment objects in the scene, e.g., if a drastic change occurs in illumination or, e.g., there is a drastic change in the environment/scene content, including, e.g., changes in position or movement or the occlusion of part or all of the object being detected, e.g., an intruder in the field of view of a security camera.
MPEG4, by way of example, introduces the notion of object-based compression that can greatly increase compression efficiency. The MPEG4 standard does not specify object extraction methods. Although many video object plane (VOP) extraction methods exist, it is common knowledge that the task of segmenting a variety of video streams is non-trivial. Most current video streaming systems do not even attempt to use object-based compression schemes, e.g. those found in MPEG4.
Prior attempts to recognize objects utilizing video recognition/detection, therefore, have not been successful in reducing the false alarms effectively. Statistics in the security industry clearly suggest that the frequent occurrence of false alarms has been the bane of the industry. Lower false alarm rates would help in gaining wider acceptability of the security systems. Currently also, bandwidth constraints and the like limit the information of need for remote receipt and utilization of the video object data indicative of a detection, e.g., of an object of interest, e.g., of an intruder, that can be processed and sent, e.g., over wired or wireless telephone lines or other forms of networked communication, including, e.g., LANs, WANs, the Internet and/or the World Wide Web or combinations thereof.
A number of recent market and technological trends are now converging to form new applications and opportunities for streaming video, e.g., over both wired and wireless networks. Streaming video standards have become very popular since they eliminate the need to download an entire video file before the video can start. Streaming video, e.g., over the Internet allows real-time viewing of events by remote users. Inexpensive and compact streaming video sensors such as video cameras on computers, Internet appliances, PDAs, and cell phones are becoming available. Many types of inexpensive webcams are already widely available for PCs. Video sensors are becoming available on less expensive and more mobile platforms also. For example, Kyocera Corp. has already introduced its Visual Phone VP-210 cell phone, which contains a built-in video camera, while a video camera add-on is available for the Visor PDA. Even the Nintendo Gameboy has a video camera accessory.
The Internet infrastructure for streaming video over both wired and wireless links is also being constructed rapidly. A number of recent startups such as iClips and Earthnoise already have web sites on-line for sharing user-supplied streaming video clips. Companies such as Akamai are creating technology, which brings video content out to the “edges” of the Internet, away from congestion and closer to the end-user. In order to facilitate the “edge-based” delivery of content, the Internet Content Adaptation Protocol (ICAP) is now being formulated by a large group of Internet companies. (See www.i-cap.org) ICAP allows the adaptation of services such as streaming video to the needs of the client device. In the wireless area, 2.5 and third-generation standards are emerging with higher bandwidths and capabilities for transmitting video information to cell phones and PDAs. PacketVideo is already demonstrating delivery of compressed video to various wireless devices, as is DoCoMo in Japan using 3G cell phones. Geolocation will become ubiquitous through the proliferation of GPS as well as cellular geolocation driven by the forthcoming E911 position locating regulations. Geolocation will enable many new location-specific services for mobile streaming video devices.
A wide variety of attractive applications are being considered and evaluated in the area of streaming web video over both wired and wireless networks. Real-time video streaming, in which the user sees events as they occur with full motion, will generate even more new and compelling applications. This can be especially true, e.g., for video streaming to wireless devices such as PDAs.
Existing Internet video security/monitoring services such as Xanboo can provide neither object-based compression nor automatic video recognition capabilities. Current security systems with non-video detection, e.g., with infrared or ultrasound or the like motion sensors suffer from the above noted defects, among others, leading to high false alarm rates.
Classical motion segmentation algorithms attempt to partition frames into regions of similar intensity, color, and/or motion characteristics. Object segmentation approaches can be broadly divided into three main types: direct intensity or color based methods, motion vector based methods, and hybrid methods. The direct intensity or color-based methods are usually based on a change detection mask (CDM) that separates moving and stationary regions. Change detection algorithms mostly rely on either a background subtraction method or a temporal difference method. They are suitable for real-time segmentation of moving regions in image sequences because of low computational complexity.
Background subtraction methods, e.g., as discussed in Skifstad, K. D. and Jain, R. C., “Illumination Independent Change Detection for Real World Image Sequences,” Computer Vision. Graphics and Image Processing 46(3): 387-399 (1989) (Skifstad and Jain 1989); Long, W. and Yang, Y. H., “Stationary Background Generation: An Alternative to the Difference of Two Images,” Pattern Recognition 23: 1351-1359 (1990) (Long and Yang 1990); Ridder, C., et al., “Adaptive background estimation and foreground detection using Kalman filtering,” Proceedings of International Conf. on recent Advances in Mechatronics, ICRAM'95, Istanbul, Turkey (1995) (Ridder et al. 1995); Kuno, Y. and Watanabe, T., “Automated detection of human for visual surveillance system,” ICPR96, Vienna, Austria (1996) (Kuno and Watanabe, 1996); Makarov, A., “Comparison of Background Extraction Based Intrusion Detection Algorithms,” ICIP96 (1996) (Markarov 1996); Eveland, C., et al., “Background Modeling for Segmentation of Video-rate Stereo Sequences,” CVPR98 (1998) (Eveland, et al., 1998); Stauffer, C. and Grimson, W. E. L., “Adaptive Background Mixture Models for Real-time Tracking,” CVPR99 (1999) (Stauffer and Grimson, 1999); Toyama, K., et al., “Wallflower: Principles and Practice of Background Maintenance” ICCV99 (1999) (Toyama, et al. 1999); Elgammal, A., et al., “Non-Parametric Model for Background Subtraction” ECCV00 (2000) (Elgammal, et al. 2000); Haritaoglu, I., et al. “A Fast Background Scene Modeling and Maintenance for Outdoor Surveillance,” ICPR00 (2000) (Haritaoglu, et al. 2000); Ivanov, Y. A., et al., “Fast Lighting Independent Background Subtraction,” IJCV 37(2): 199-207 (2000) (Ivanov, 2000); Seki, M., et al., “A Robust Background Subtraction Method for Changing Background,” WACV00 (Seki, et al. 2000); and Ohta, N. A Statistical Approach to Background Subtraction for Surveillance Systems, ICCV01 (2001) (Ohta 2001), the disclosures of each of which are hereby incorporated by reference, compare the intensity or color of the observed images with that of the background to identify foreground/background regions. The background is either previously acquired using an empty scene or is estimated and updated dynamically. Adaptive background methods, such as those discussed in Collins, R., et al., “A system for video surveillance and monitoring,” Proceedings of the American Nuclear Society (ANS) Eighth International Topical Meeting on Robotics and Remote Systems (1999) (Collins, et al. 1999); Skifstad and Jain, 1989, Ridder, et al., 1995, Hotter, M., et al., “Detection of moving objects using a robust displacement estimation including a statistical error analysis,” ICPR96, Vienna, Austria (1996) (Hotter et al., 1996); Amamoto, N. and Matsumoto, K., “Obstruction detector by environmental adaptive background image updating,” 4th World Congress on Intelligent Transport Systems, Berlin (1997) (Amamoto and Matsumoto, 1996); Wren, C., “Pfinder: Real-time tracking of the human body,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7): 780-785 (1997) (Wren, 1997); Horprasert, T., et al., “A Statistical Approach for Real-time Robust Background Subtraction and Shadow Detection.” Proc. IEEE ICCV'99 FRAME-RATE Workshop (1999) (Horprasert, et al., 1999); and Huwer, S. and Niemann., H., “Adaptive change detection for real-time surveillance applications,” Third IEEE International Workshop on Visual Surveillance VS00, Dublin, Ireland (2000) (Huwer and Niemann, 2000); the disclosures of which are hereby incorporated by reference, can be crucial for real-world scenes when the background or illumination changes, otherwise background elements can be erroneously included in foreground. The numerous approaches to this problem differ in the type of background model used and the procedure used to update the model. Most of the background estimation methods use statistical methods to represent as well as update the background. Wren et al., 1997 discusses a model for the background pixels as a single Gaussian whose means and variances are updated continuously. Stauffer and Grimson, 1999, model each pixel as a mixture of Gaussians and use an on-line approximation to update the model. Seki et al., 2000 propose a robust background subtraction method for changing background by expressing changes in the background using a multi-dimensional image vector space and learning the chronological changes in terms of distribution of these vectors. These methods can produce holes in computed foreground if, e.g., color or intensity matches the background.
Temporal differencing methods, as discussed, e.g., in Makarov, A., et al., “Intrusion Detection Using Extraction of Moving Edges,” IAPR Int. Conf. On Pattern Recognition ICPR94, 1994 (Makarov, 1994), and Paragious, N. and Tziritas, G. “Detection and location of moving objects using deterministic relaxation algorithms,” ICPR96, Vienna, Austria (1996) (Paragious and Tziritas, 1996), the disclosures of which are hereby incorporated by reference, subtract consecutive images followed by thresholding to detect the region of change which can then be attributed to moving foreground. These methods are suited for detection of moving objects and can adapt to changing light conditions quickly. However, these methods will fail to detect objects that were previously moving but become stationary or more or less stationary. Hybrid approaches, such as discussed in Amamoto and Matsumoto, 1997 and Huwer and Niemann, 2000, based on a combination of background subtraction and temporal differencing have also been proposed. Unlike previous simple and hybrid methods, Huwer and Niemann, 2000 discuss adapting the background model only on regions detected by the temporal difference method in order to avoid reinforcement of adaptation errors.
Motion methods estimating a dense motion field followed by segmentation of the scene based only on this motion information have been discussed, e.g., in Adiv, G., “Determining three-dimensional motion and structure from optical flow generated by several moving objects.” IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-7: 384-401 1985 (Adiv, 1985), and Chang, M. M., et al., “Motion-field segmentation using an adaptive MAP criterion,” ICASSP93, Minneapolis, Minn. (1993) (Chang, et al., 1993), the disclosures of which are hereby incorporated by reference. Adiv, 1985 discusses utilizing segmentation of the flow field into connected components using Hough transform and merging segments with similar 3-D motion to obtain final segmentation. Chang et al., 1993 discusses the use of both motion and intensity information for segmentation in a Bayesian framework. Simultaneous motion estimation and segmentation in a Bayesian framework has also been proposed, as discussed, e.g., in Chang, M. M., et al., “An algorithm for simultaneous motion estimation and segmentation,” ICASSP94, Adelaide, Australia (1994) (Chang, et al., 1994), the disclosure of which is hereby incorporated by reference, where the intensity is the only observation and both segmentation and motion field are estimated. While these methods allow incorporation of mechanisms to achieve spatial and temporal continuity, they are generally unsuitable for real-time applications and may need a priori information about the number of objects.
Segmentation algorithms that specifically address video object plane (VOP) generation have also been proposed, many of which are part of the ISO/MPEG-4 N2 core experiment group on automatic segmentation techniques. These methods can be further divided into those that do explicit tracking vs. implicit tracking. In the implicit tracking area, e.g., as discussed by Neri, A., et al., “Automatic moving object and background segmentation,” Signal Processing, 66(2): 219-232 (1998) (Neri, et al., 1998) (Neri et al. 1998), the disclosure of which is hereby incorporated by reference, involve a method for automatic segmentation by separating moving objects from a static background. Potential foreground regions (moving objects and uncovered regions) can be identified by higher order statistics on groups of interframe difference images. For example, Mech, R. and Wollborn, M., “A noise robust method for segmentation of moving objects in video sequences,” IEEE Int. Conf. On Acoustics, Speech and Signal Processing, ICASSP97, Munich, Germany (1997) (Mech and Wollburn, 1997), the disclosure of which is hereby incorporated by reference, generate an initial change detection mask by temporal differencing and smoothing boundaries by local adaptive relaxation. Temporal stability can be maintained by incorporating memory about whether pixels belonged to an object or not in previous CDMs. The object mask can then be calculated from the CDMs by eliminating uncovered background and adapting to the gray level edges to improve location of boundaries. Both of these methods can, however, lose track of an object that has stopped after previously moving. Choi, J. G., et al. “Automatic segmentation based on spatio-temporal information,” ISO/IEC/JTC1/SC29/WG11/MPEG97/m2091, Bristol, UK (1997) (Choi, et al., 1997), the disclosure of which is hereby incorporated by reference, discuss the use of a watershed algorithm to detect the location of object boundaries followed by a size filter that can merge small regions into neighboring regions. Every region with more than half of its pixels marked as changed in the so generated CDM can then be assigned to foreground. To enforce temporal continuity, segmentation can be aligned with a previous frame and those regions where, e.g., the majority of the pixels belonged to foreground before can be added to foreground as well. This can allow tracking an object even when it has stopped for arbitrary time. Chien, S. Y. et al., “An efficient video segmentation algorithm for realtime MPEG-4 camera system.” Proceedings of Visual Communication and Image Processing (VCIP2000) (Chien, et al., 2000), the disclosure of which is hereby incorporated by reference, discuss the use of a combination of temporal differencing and background subtraction to obtain the CDM. The change detection mask can be generated using background subtraction when, e.g., the statistics of a pixel have been stationary for a period of time and temporal differencing can be utilized otherwise. Connected component labeling and region filtering can also be performed on the CDM followed by dilation in the temporal domain and smoothing the edges using, e.g., morphological opening and closing operations. Alatan, A. A., et al., “A rule based method for object segmentation in video sequences,” ICIP97 (1997) (Alatan et al., 1997) present an algorithm that fuses motion, color and accumulated previous segmentation data to both segment and track objects. Rule-based processing modules use the motion information to locate objects in scene, color information to extract the true boundaries, and segmentation result of previous frame to track the object.
Wren et al., 1997 discuss a method for tracking people and interpreting their behavior by using a multi-class statistical model of color and shape to obtain a 2-D representation of, e.g., head and hands, e.g., in a wide range of viewing conditions. The method can build a model of the scene and the object, calculate whether the current frame pixels belong to the foreground, e.g., to a person or to the background scene, e.g., by tracking and updating a statistical model of the object and scene. This method, however, can fail, e.g., when sudden or large changes occur, e.g., in the scene. Meier, T. and Ngan., K. N. “Automatic segmentation of moving objects for video object plane generation,” IEEE Transactions on circuits and systems for video technology 8(5) (1998) (Meier and Ngan, 1998), the disclosure of which is hereby incorporated by reference, propose to track and update the object model by Hausdorf matching. The initial object model can be derived by temporal differencing on images that have been filtered to remove, e.g., stationary objects. The VOP can be extracted from the object model.
In the explicit tracking methods area, Collins et al. discuss the detection of foreground pixels by a background subtraction method that maintains an evolving statistical model of the background, which can then be adapted to slow changes in the environment. The moving objects or blobs can be obtained by connected component labeling on foreground pixels followed by blob clustering, morphological opening and closing, and size filtering. Target tracking can then be done by matching blobs in a current frame with existing tracks using, e.g., a cost function based on blob features such as for size, color histogram, centroid and/or shape. Tracking can persist even when targets become occluded or motionless. Fieguth, P. and Terzopoulos, D., “Color-based tracking of heads and other mobile objects at video frame rates,” Proceedings of the Conference on Computer Vision and Pattern Recognition (1997) (Fieguth, et al., 1997), the disclosure of which is hereby incorporated by reference, have proposed a method for object tracking based on color information only, which is robust with respect to occlusion via an explicit hypothesis-tree model of the occlusion process. They do not address, however, detection and localization of new objects to track and cannot handle very well object changes in shape, scale or color.
Although the easiest approach to object detection is using pixel intensities, it is obvious that pixel-based approaches can fail, e.g., because they do not Stake into account the structure implicit in many complex objects. Edge-based methods examine only a small local neighborhood at a fine scale. For intruder detection applications that are sometimes riddled with illumination problems the edge-based approach can often result in spurious patterns. C. Papageorgiou, T. Evgeniou and T. Poggio, “A Trainable Pedestrian Detection System,” Proc. of IEEE Intelligent Vehicles Symposium, pp. 241-246, October (1998) (Papageorgio, et al., 1998) have suggested the application of a multi-scale approach to detect faces and pedestrians.
The problem of detecting and recognition of humans in images is a well-studied research topic. The work in this area can be broadly classified into two types: recognition based on motion cues and recognition based on shape cues. Human recognition from motion cues relies on the segmentation of objects in the scene using motion information extracted from an image stream based on techniques such as optic flow as discussed in B. Horn and B. G. Schunk (1981), “Determining Optic Flow,” Artificial Intelligence, Vol. 17, pp. 185-203 (Horn 1981), the disclosure of which is hereby incorporated by reference, and frame differencing as discussed in O. Faugeras, “Three-Dimensional Computer Vision—A Geometric Viewpoint,” MIT Press, 1993 (Faugeras 1993), the disclosure of which is hereby incorporated by reference. The segmented region is then analyzed to recognize the presence/absence of humans in the scene. In K. Rohr, “Towards model-based recognition of human movements in image sequences,” Computer Vision, Graphics and Image Processing: Image Understanding, vol. 59, pp. 94-115, 1994 (Rohr 1994), the disclosure of which is hereby incorporated by reference, humans are recognized based on analyzing movements of objects segmented using frame-differencing and ego-motion subtraction. Texture and contour information of segmented blobs is combined with temporal gait analysis of the walking process to recognize humans, as discussed in C. Curio, J. Edelbrunner, T. Kalinke, C. Tzomakas and W. von Seelen, Walking Pedestrian Recognition, in Proc. IEEE Intl. Conf. On Intelligent Transportation Systems, pp. 292-297, October, 1999 (Curio 1999), the disclosure of which is hereby incorporated by reference. A stereo-based algorithm is used in C. Wohler, J. K. Aulaf, T. Portner and U. Franke, “A Time Delay Neural Network Algorithm for Real-time Pedestrian Detection,” Proc. of IEEE Intelligent Vehicles Symposium, pp. 247-251, October, 1998 (Wohler 1998), the disclosure of which is hereby incorporated by reference, for detection and tracking of humans by classifying extracted blobs using a time-delay neural network. The neural network classifies the blob as humans based on the temporal motion patterns of the human leg. Quantitative geometric descriptions of human movements are used for human recognition in S. Wachter and H. H. Nagel, “Tracking Persons in Monocular Image Sequences,” Computer Vision and Image Understanding, vol. 74, no. 3, pp. 174-192, June, 1999 (Wachter 1999), the disclosure of which is hereby incorporated by reference. These movements are obtained by filtering the projection of the three-dimensional person model to consecutive frames of an image sequence. There are several other approaches that also use motion cues in combination with three-dimensional kinematic models of human anatomy for human recognition, as discussed in A. Baumberg and D. Hogg, “Generating spatiotemporal models from examples,” Image and Vision Computing, vol. 14, pp. 525-532, 1996 (Baumberg 1996); Z. Chen and H. J. Lee, “Knowledge-guided visual perception of 3-D human body movements,” IEEE Trans. Systems, Man and Cybernetics, vol. 22, pp. 336-342, 1992 (Chen 1992); D. M. Gavrila and L. S. Davis, “3-D Model based tracking of humans in action: A multi-view approach,” Proc. IEEE Conf. On Computer Vision and Pattern Recognition, pp. 73-80, 1996 (Gavrila 1996); Y. Guo, G. Xu and S. Tsuji, “Tracking human body motion based on a stick model,” Journal of Visual Communication and Image Representation, vol. 5, pp. 1-9, 1994 (Guo 1994); M. K. Leung and Y. H. Yang, First Sight: A human body outline labeling system, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 17, pp. 359-377, 1995 (Leung 1995); J. O'Rourke and N. I. Badler, “Model-based image analysis of human motion using constraint propagation,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 78, pp. 5-43, 1995 (O'Rourke 1995); A. Pentland and B. Horowitz, Recovery of non-rigid motion and structure, IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 13, pp. 730-742, 1991 (Pentland 1991); C. Wren, A. Azarbayejani, T. Darrell and A. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” SPIE, vol. 2615, pp. 89-98, 1996 (Wren 1996); and L. Q. Xu and D. Hogg, “Neural networks in human motion tracking—An experimental study,” Proc. of 7th British Machine Vision Conference, vol. 2, pp. 405-414, 1996 (Xu 1996), the disclosures of each of which are incorporated herein by reference. In O. Masoud and N. Papanikolopoulos, “A robust real-time multi-level model-based pedestrian tracking system,” Proc. of the ITS America Seventh Annual Meeting, Washington D.C., June, 1997 (Masoud 1997), the disclosure of which is hereby incorporated by reference, blobs are extracted from scenes using a combination of motion cues and background models. These blobs are then analyzed for human-like movement patterns for recognition and also for Kalman filter-based tracking. Motion and size filtering operations are used to extract humanlike objects from images in S. Bouzer, J. M. Blosseville, F. Lenoir and R. Glachet, “Automatic Incident Detection: Slow Isolated Vehicle and Pedestrian Detection on Motorway Using Image Processing,” Proc. of Seventh Intl. Conf. On Road Traffic Monitoring and Control, pp. 128-133, October, 1994 (Bouzer 1994), the disclosure of which is hereby incorporated by reference. Then a motion trajectory is established for the extracted blob and is used for human recognition.
Shape based methods exploit the intrinsic shape characteristics found in humans to aid in the recognition process. In Papageorgiou et al., 1998, a set of low-level wavelet features is extracted from examples of humans found in natural scenes in different poses. These features are then used to train a support vector machine classifier for recognition of humans. A stereo-based segmentation algorithm is applied in L. Zhao and C. Thorpe, “Stereo and Neural Network-based Pedestrian Detection, in Proc. of IEEE Intl. Conf on Intelligent Transportation Systems,” pp. 298-303, October, 1999 (Zhao 1999), the disclosure of which is hereby incorporated by reference, to extract objects from background and then features from the extracted objects are fed to a neural network that learns to differentiate humans from non-humans. In A. Broggi, M. Bertozzi, A. Fascioli and M. Sechi, “Shape-based Pedestrian Detection,” Proc. of IEEE Intelligent Vehicles Symposium, pp. 215-220, October, 2000 (Broggi 2000), the disclosure of which is hereby incorporated by reference, objects are segmented from scenes using a combination of a stereo algorithm and constraints based on morphological characteristics. The extracted objects are then analyzed for strong vertical symmetry of the human shape for recognition.