Various techniques are known in the art for automatic computerized segmentation and identification of parts of a human body that appear in an image or image stream. These techniques are used in a variety of computer vision applications, such as in gesture-driven user interfaces. For example, PCT International Publication WO 03/071410, whose disclosure is incorporated herein by reference, describes a gesture recognition system using depth-perceptive sensors. The gestures are recognized based on the shape of the body part and its position and orientation over an interval. Another gesture-based user interface is described in U.S. Patent Application Publication 2010/0235786, whose disclosure is likewise incorporated herein by reference.
Methods of segmentation and body part identification have been applied to both two-dimensional (2D) images, such as color video images, and to three-dimensional (3D) images, also referred to as depth images or depth maps. (In a depth image, each pixel has a value indicating the distance from the camera to the corresponding point in the scene, rather than the brightness and color of the point as in a 2D image.) Both 2D and 3D processing approaches have been found to have respective advantages and drawbacks in this regard.
Recently, some researchers have attempted to combine color and depth processing in order to segment and identify body parts. For example, Bleiweiss and Werman describe a technique of this sort in “Fusing Time-of-Flight Depth and Color for Real-Time Segmentation and Tracking,” Dyn3D '09 Proceedings of the DAGM 2009 Workshop on Dynamic 3D Imaging (Springer-Verlag, 2009), pages 58-69, which is incorporated herein by reference. The authors present a framework for real-time segmentation and tracking by fusing depth and RGB color data using a mean shift-based algorithm.
Munoz-Salinas, et al., describe another combined processing method in “People Detection and Tracking Using Stereo Vision and Color,” Image and Vision Computing 25 (2007), pages 995-1007, which is also incorporated herein by reference. The authors describe a system that tracks people by using a Kalman filter to combine color and position information. They note that tracking based exclusively on position information is unreliable when people establish close interactions, and they therefore also include color information about the people's clothes.