In the past, a visual interactive based human-machine interactive system uses single camera or color information to analyze the image. Under some conditions, such as the user closing to the background color, or changing ambient light, or the complex background of many people, this kind of technology is likely to cause insufficient image recognition rate. The existing technologies use the information of a depth image to aid the image analysis. For example, some technologies may use depth images to track a user's local area, or capture and track the extremity position of the user, or perform detection on one or more extremities of a human target. Some techniques may use such as color and depth information to find hand position, or hand area and facial area.
A technology uses the depth image to track the user's local area, such as shown in FIG. 1. This technology finds the edge of the target 106 from a depth image and finds a best fit pre-defined contour shape from the edge, such as a contour shape 104 of the right hand of the target 106, wherein a depth image has a corresponding sample edge potential transform map 100. For example, the sample edge potential transform map 100 includes a modeled target to be tracked in a potential field, and the value of each grid point in the sample edge potential transform map 100 indicates how far the grid point from the edge 101 of the target object 106. When the distance of the target and the camera changes, the size of the contour shape will vary accordingly. This technique requires a plurality of pre-defined contour shapes.
Another technology using a depth image to capture and track extremity position of user produces the grid of voxels from the depth image, removes the background grids of voxels and isolates the user, then finds the extremity location of the user from the isolated user. In other words, this technology finds the extremity location of the user through creating a three dimensional grids and removing background to isolate human target.
Yet another technology uses depth images to identify extremities of each part of the user's body such as shown in FIG. 2. This technology generates a three-dimensional set of surface meshes 210 from the data of the depth image 202, then calculates geodesic distances of each grid point in the set of surface meshes, and classifies surface meshes in this set according to the lengths of different paths. Each mesh corresponds to a body part such as head, hand, feet, shoulders, or other body parts. In other words, this technique finds out the extremity position of each part of the user's body by generating the three-dimensional set of surface meshes and computing the geodesic distance of each point path on the set of surface meshes.
One technology uses color and depth information to locate multiple hand areas and face areas by segmenting the human body and then segmenting human's skin-color areas by using the color detection; and categorizes the skin-color areas by using a k-means method; finally, distinguishes hand area or face area in conjunction with the depth information. The technology of using color and depth information to locate hand position uses LUV color space, and couples with mixture of Gaussians model, to find out the skin-color areas; and helped by the depth information to remove background skin areas. In the front skin-color areas, the technology compares size, height, and depth information of any two areas to locate the positions of hands.
Another technique analyzes the upper and the lower arms of a human subject with the convex degree feature (CDF) of the depth image. As shown in FIG. 3, this technique performs an upper arm detection 310 from the detected head and shoulder positions, and calculates the convex degree feature of each pixel in the depth map 320. This technique then uses fixed-size inner regions and outer regions, calculates a pixel ratio of depth distribution, and performs a hand detection 330 and a lower arm detection 340 by using these convex degree features to determine the position of a full arm in the image 350 according to the results of the upper arm detection and the lower arm detection.
In the above mentioned image analysis technologies, some technologies may be unable to build a single model to perform comparison due to different distances between the user and the video camera device such that the sizes of the contour shapes of the local areas in the image are not the same. Some technologies may be unable to obtain complete skeleton information of the user due to the shelter in front of the user. Some technologies use skin-color information, and the impact of ambient light may result in a lower recognition rate.
Therefore, it is an important issue on how to design an object positioning technology which may only use the depth image information without establishing user skeleton, and use the real distance information for feature extraction, to positioning near or far objects by only establishing a single model unaffected by the ambient light and shelter.