Consumer electronic devices may include imaging devices that may attain images or series of images. Such images may be used to perform object detection, object recognition, gesture recognition, or the like of objects in the scene represented by the images. For example, objects may be detected, tracked, and recognized for focusing the imaging device in image capture settings, gesture recognition, or the like. Furthermore, in gesture recognition contexts, human gestures typically made via the user's hands or face may provide input to the device for navigating the device, playing games, and so on. Such gesture recognition may allow users to interact with the device naturally and without an intervening mechanical interface such as a keyboard, mouse, or even touch display.
In some contexts, it may be desirable to detect, track, identify, and label a blob as a hand blob or other object and generate parameters or the like for a non-rigid model such that when implementing the parameters, the non-rigid model matches, or attempts to match, the blob. Determining such articulated body parameters (e.g., determining the skeleton of an articulated body) based on data captured by a single camera may be a challenging problem due to viewpoint variability, the complex articulations of the body being modeled (e.g., fingers in the context of hands), the prevalence of self occlusions caused by natural motions, and the like. Earlier techniques in the context of object detection and tracking have focused on input from RGB and grayscale images. However, the introduction of consumer grade 3D sensors has shifted the focus to techniques based on the 3D data obtained by such devices. Current techniques include reconstructing a deformable surface model and matching articulated body models (e.g., hand models or the like) to input depth images by solving an optimization problem.
It may be advantageous to detect, track, and provide a pose estimation of an articulated body based on input image data. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to perform object detection, tracking, and pose estimation becomes more widespread.