The present invention relates to constructing a personalized avatar of a human subject, and more particularly, to constructing a 3D mesh model of a human subject from a single image obtained using a depth sensor.
Depth sensors are cameras that provide depth information along with typical image information, such as RGB (Red, Green, Blue) data. A depth camera can be a structured light based camera (such as Microsoft Kinect or ASUS Xtion), a stereo camera, or a time of flight camera (such as Creative TOF camera). The image data obtained from a depth camera is typically referred to as RGB-D (RGB+Depth) data, which typically includes an RGB image, in which each pixel has an RGB value, and a depth image, in which the value of each pixel corresponds to a depth or distance of the pixel from the camera. With the advent of Kinect, various approaches have been proposed to estimate a human body skeleton from RGB-D data. However, such approaches typically require multiple sensors or video sequences to obtain a mesh of a person.
SCAPE is a method for human body modeling that is described in Draomir Anguelov et al., “SCAPE: Shape Completion and Animation of People”, ACM Trans. Graph, Vol. 24 (2005), pp. 408-416. SCAPE is widely used due to its capability to model the human body shape and pose variations in a compact fashion. Instead of learning a complex function for many correlated pose and shape parameters, SCAPE decouples the model and learns a pose deformation model from one person with different poses, and then learns a shape deformation model from different people with one pose. However, SCAPE is only applied to skin clad subjects and does not accurately deal with closing variations and sensor noise.