Several approaches use contour based image processing methods on segmented hand or body shapes. However, these methods regularly work only when the extremities or fingers are stretched and parallel to the sensor.
In ACIVS, volume 3708 of Lecture Notes in Computer Science, pages 9-16. Springer, 2005, the authors declare the fingertip to be the contour pixel that is farthest away from the center of gravity of the hand. This method works only for single stretched fingers that do not point towards the sensor.
Multiple fingertips are addressed by Rai Katz, Kevin Gabayan, and Hamid Aghajan, A multi-touch surface using multiple cameras, in Proceedings of the 9th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS '07, pages 97-108, Berlin, Heidelberg, 2007. Springer-Verlag. Here, the distance of the contour pixels with respect to the palm center is inspected. Local maxima are treated as fingertip candidates. Remaining false positive fingertips (e.g. knuckles of the thumb) are eliminated by analyzing the local contour around the fingertip candidates. True fingertips show a high average distance from the fingertip candidate to the local centroid. Again, this does not work for fingers pointing towards the sensor. Further, the frame rate is claimed to be only 10 Hz.
In Martin Do, Tamim Asfour, and Rudiger Dillmann, Particle filter-based fingertip tracking with circular Hough transform features, ii MVA2011 IAPR Conference on Machine Vision Applications, 2011, the Hough-transform is used to detect fingertips in combination with a particle filter and a mean-shift procedure for tracking. This method is computationally expensive. According to the authors it runs at 15 frames per second on a 2.4 GHz Dual Core CPU.
In Ko-Jen Hsiao, Tse-Wei Chen, and Shao-Yi Chien, Fast fingertip positioning by combining particle filtering with particle random diffusion, in ICME, pages 977-980, IEEE, 2008, a particle diffusion approach propagates particles starting from the center of the palm to positions close to the contour of skin-color segmented input images. Particle clusters identified are treated as fingertip candidates while particles close to the palm are ignored. Again, this works only for stretched fingers that do not point towards the sensor.
Accumulative Geodesic Extrema based on depth data are proposed by Christian Plagemann, Varun Ganapathi, Daphne Koller, and Sebastian Thrun, Real-time identification and localization of body parts from depth images, in IEEE International Conference on Robotics and Automation (ICRA, 2010). This approach assumes that the geodesic distance from the centroid of a body or a hand to its extremities is independent of the pose. Thus, starting at the centroid of an object, extremities are found by successively adding pixels that maximize their geodesic distance from this centroid.
The above method is optimized by Hui Liang, Junsong Yuan, and Daniel Thalmann, 3d fingertip and palm tracking in depth image sequences, in Proceedings of the 20th ACM International Conference on Multimedia, MM '12, pages 785-788, New York, N.Y., USA, 2012, ACM, by restricting the fingertip candidates a-priori to those positions where depth data becomes discontinuous. False positives are further reduced by adding a feature detector that measures the ratio of object vs. non-object pixels in a rectangular neighborhood around fingertip candidates. Particle filtering is used to track fingertips across multiple frames.
The method that is most likely implemented in the Kinect system and proposed by J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kip man, and A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR '11, pages 1297-1304, Washington, D.C., USA, 2011, IEEE Computer Society, is based on a large database of motion capture data. Here, a body part classifier is build from depth comparison features using randomized decision forests. However, this approach requires a large database and obtaining the decision trees took a day on a 1000 core cluster making it very hard to reproduce.
Other approaches use 3-dimensional models and project them into the image space.
One of the earliest works by Jim Rehg and Takeo Kanade, Digiteyes: Vision-based human hand tracking, Technical Report CMU-CS-93-220, Computer Science Department, Pittsburgh, Pa., December, 1993, uses a hand model where the fingers are cylinders. The direction of the central lines of each cylinder and their joints are estimated from a grey-scale image using a local operator. A non-linear least squares approach is used to estimate the pose of the hand. Fingertip positions are obtained by projecting the end points of the fingertip cylinders into the image space.
In Bjoern Stenger, Paulo R. S. Mendonça, and Roberto Cipolla, Model-based 3d tracking of an articulated hand, in CVPR (2), pages 310-315, IEEE Computer Society, 2001, the authors define a hand model with twenty seven degrees of freedom from thirty nine truncated quadrics. Contours are generated from the model and the model parameters are estimated using an unscented Kalman filter. In this case, fingertip positions are obtained by projecting the 3D positions of the fingertip quadrics into the image space.
In U.S. patent application Ser. No. 13/082,295 (US 2012-0113241), after skin based segmentation of RGB input images, fingertip candidates are identified as those contour points with highest curvature. Valleys in between the fingers are eliminated and ellipses are fit to the fingertip candidates.
A touchless pointing device is described in U.S. Pat. No. 8,907,894. Although the patent claims to process images for presence, location and velocity of objects the patent does not disclose a method that actually does extract the locations.
The same holds for U.S. patent application Ser. No. 13/907,925 (US 2013-0343607), where computer vision techniques such as shape recognition are applied for touchless control of a device. However, no details are given on how to apply these methods in a fast and robust way.
In U.S. Pat. No. 9,001,036, fingertips are identified as those pixels that changed in intensity from one image to the next and are much brighter than the surrounding pixels. Such a method will fail if extremities point towards the sensor or if they are close together.
Mathematical morphological filtering is applied in tracking method of three-dimensional finger motion locus based on stereo vision, Sep. 19, 2007, CN Patent App. CN 2007/10,039,941.
A combination of edge detection and depth data processing is used in U.S. Pat. No. 8,204,274. Corners of the object outline are treated as extremity candidates and depth information is used to validate the reliability of this estimate.
The present application is directed to improvements in extremity identification.