Consumer electronic devices may include imaging devices that may attain images or series of images. Such images may be used to perform object detection, object recognition, gesture recognition, or the like of objects in the scene represented by the images. For example, objects may be detected, tracked, and recognized for focusing the imaging device in image capture settings, gesture recognition, or the like. Furthermore, in gesture recognition contexts, human gestures typically made via the user's hands or face may provide input to the device for navigating the device, playing games, and so on. Such gesture recognition may allow users to interact with the device naturally and without an intervening mechanical interface such as a keyboard, mouse, or even touch display.
In some contexts, it may be desirable to determine a non-rigid transformation between the surface of an articulated body and a 3-dimensional (3D) point cloud or the like. For example, the 3D point cloud may be determined based on depth images attained via the device or the like and the 3D point cloud may represent an image of a hand, human body, or the like that is to be detected, tracked, recognized, and so on. Furthermore, the articulated body may be a model or representation of a hand, a human body, or any other structure. For example, the articulated body may be modeled by rigid bodies connected by joints. In the context of a hand or a human body, the rigid bodies may be associated with bones and the joints may be associated with anatomical joints. In such contexts, forward kinematics (FK) techniques may determine the pose of the articulated body based on given articulated body parameters (e.g., rigid body lengths, joint angles, and so on). Furthermore, inverse kinematics (IK) techniques may attempt to determine articulated body parameters that best represent a given input.
For example, in the described context of determining a non-rigid transformation between the surface of an articulated body and a 3D point cloud, the 3D point cloud (e.g., of a hand, a human body or the like) may provide a target for matching the pose of the articulated body (e.g., an articulated body of the same type being represented by the 3D point cloud: an articulated body of a hand, of a human body, or the like). As discussed, the pose of the articulated body may be defined by articulated body parameters. Therefore, it may be desirable to determine, based on a 3D point cloud and/or similar data, articulated body parameters that provide a pose to the articulated body that best matches the 3D point cloud.
Determining such articulated body parameters (e.g., determining the skeleton of an articulated body) based on data captured by a single camera may be a challenging problem due to viewpoint variability, the complex articulations of the body being modeled (e.g., fingers in the context of hands), the prevalence of self occlusions caused by natural motions, and the like. Earlier techniques in the context of object detection and tracking have focused on input from RGB and grayscale images. However, the introduction of consumer grade 3D sensors has shifted the focus to techniques based on the 3D data obtained by such devices. Current techniques include reconstructing a deformable surface model and matching articulated body models (e.g., hand models or the like) to input depth images by solving an optimization problem.
It may be advantageous to determine a non-rigid transformation between the surface of an articulated body and a 3-dimensional (3D) point cloud with high accuracy. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to perform object detection, tracking, and pose estimation becomes more widespread.