The present invention, in some embodiments thereof, relates to object recognition and, more particularly, but not exclusively, to a system and method using optical projections onto a scene, for example to detect and track an object such as a user hand in three dimensions (3D).
Various methods allow users to remotely control certain devices. For example, predefined gestures or postures of a user's body parts (e.g., arms, legs) may control a device. In methods using gestures or postures for device control, a gesture is identified when a user's body part aligns with a specified position, and a computer or other device performs a function or performs an action corresponding to the identified gesture.
In some embodiments, gestures by a user are identified by capturing images or video of the user via an image capture device and analyzing multiple pixels in the images or in the video data. Conventional gesture detection methods analyze a pixel in an image by comparing the pixel's color values with color values of other pixels in proximity to the pixel. Hence, these conventional methods are dependent on a significant difference in color values between a body part of the user and objects in the background of the image.
Other methods for gesture detection form a skeleton model of one or more body parts of the user (e.g., a three dimensional model of a user's hand) and analyze the skeleton model to identify gestures by the user. Alternative methods for gesture detection use a three-dimensional depth map where each pixel includes a distance between a depth camera and a portion of an object corresponding to a pixel. A depth map may be calculated using a variety of methods. For example, depth mapping of scenery is done by projecting a known light pattern (i.e., a structured light pattern) onto the scenery, and an image capture device captures images of the scenery when the known light pattern is projected onto the scenery. Because the light pattern is fixed and known in advance, sub-portions or unique features of the light pattern may be identified. Distance between portions of the scenery and the image capture device (i.e., “depth” of portions of the scenery) is calculated based on shifts of identified features of the light pattern in images captured by the image capture device. However, capturing images of a light pattern projected onto scenery involves analysing larger amounts of a captured image to identify a feature of the light pattern in the captured image that can be correlated with the features of the known light pattern. Additionally, a relatively large separation between the image capture device and a projector projecting the known light pattern is necessary to provide higher-resolution depth detection by creating a larger shift of the image of the known light pattern with respect to a depth shift of an object in the scenery.
However, these conventional methods of determining a depth map are computationally expensive and do not produce results that allow accurate determination of certain objects. For example, conventional depth mapping methods do not allow accurate detection of fingers or body parts to distinguish between closely related gestures or postures. Additionally, present posture, skeleton model, gesture, recognition methods and systems require prior posture or gesture to be identified by the system's camera. For example the user must present his hand to the camera to provide a “stop” sign posture which is already well defined and recognized by the system. This prior posture detection step restrict the natural behavior of the user and complicates the gesture recognition procedure as it requires the user to perform a predefine posture before each interaction with the gesture recognition system