This disclosure relates generally to depth mapping and, more particularly, to a method of depth mapping using optical projections into a volume, allowing detection and tracking of an object in three dimensions
Various methods allow users to remotely control certain devices. For example, predefined gestures or postures of a user's body parts (e.g., arms, legs) may control a device. In methods using gestures or postures for device control, a gesture is identified when a user's body part aligns with a specified position, and a computer or other device performs a function or performs an action corresponding to the identified gesture.
In some embodiments, gestures by a user are identified by capturing images or video of the user via an image capture device and analyzing multiple pixels in the images or in the video data. Conventional gesture detection methods analyze a pixel in an image by comparing the pixel's color values with color values of other pixels in proximity to the pixel. Hence, these conventional methods are dependent on a significant difference in color values between a body part of the user and objects in the background of the image.
Other methods for gesture detection form a skeleton model of one or more body parts of the user (e.g., a three dimensional model of a user's hand) and analyze the skeleton model to identify gestures by the user. Alternative methods for gesture detection use a three-dimensional depth map where each pixel includes a distance between a depth camera and a portion of an object corresponding to a pixel. A depth map may be calculated using a variety of methods. For example, depth mapping of scenery is done by projecting a known light pattern (i.e., a structured light pattern) onto the scenery, and an image capture device captures images of the scenery when the known light pattern is projected onto the scenery. Because the light pattern is fixed and known in advance, sub-portions or unique features of the light pattern may be identified. Distance between portions of the scenery and the image capture device (i.e., “depth” of portions of the scenery) is calculated based on shifts of identified features of the light pattern in images captured by the image capture device. However, capturing images of a light pattern projected onto scenery involves analysing larger amounts of a captured image to identify a feature of the light pattern in the captured image that can be correlated with the features of the known light pattern. Additionally, a relatively large separation between the image capture device and a projector projecting the known light pattern is necessary to provide higher-resolution depth detection by creating a larger shift of the image of the known light pattern with respect to a depth shift of an object in the scenery.
However, these conventional methods of determining a depth map are computationally expensive and do not produce results that allow accurate determination of certain objects. For example, conventional depth mapping methods do not allow accurate detection of fingers or body parts to distinguish between closely related gestures or postures.