1. Field of the Invention
The present invention relates to a recognition method, more particularly to a method for recognizing an object from two images, a method for acquiring depth information of an object from two images, and an electronic device for implementing the same.
2. Description of the Related Art
At present, a common input device for an electronic apparatus may be at least one of a computer mouse, a keyboard, and a touch screen which also serves as an output interface. For promoting freedom of human-machine interaction, there is a technique using a recognition result of voice, image, etc. as an input command. Moreover, a method which utilizes image recognition of body movements and gestures to perform operations has been undergone constant improvement and now does faster calculations. Relevant techniques have been developed from requiring wearing recognizable articles of clothing or gloves into directly locating a human body or a hand from an image for subsequent recognition of body movements and gestures.
A conventional technique is to generate volume elements (voxels) according to a depth image, and to identify a human body and to remove a background behind the human body based on the volume elements. In this way, extremity skeletons of the human body may be further identified so as to obtain an input command by recognizing body movements according to a series of images containing the extremity skeletons.
A conventional method for generating a depth image utilizes a traditional camera in combination with a depth camera to capture images.
The aforesaid depth camera adopts a Time of Flight (ToF) technique which is capable of measuring a distance between an object and the depth camera by calculating the time it takes for an emitted infrared light to hit and to be reflected by the object.
There is another depth camera, such as the depth camera provided in the game console available from Microsoft Corporation, that utilizes a Light Coding technique. The Light Coding technique makes use of continuous light (e.g., infrared light) to encode a to-be-measured space, reads the light that encodes the space via a sensor, and decodes the light read thereby via chip computation so as to generate an image that contains depth information of the space. A key to the Light Coding technique relies on laser speckles. When a laser light illuminates a surface of an object, reflected dots, which are called laser speckles, are formed. The laser speckles are highly random and have shapes varying according to a distance between the object and the depth camera. The laser speckles on any two spots with different depths in the same space have different shapes, such that the whole space is marked. Therefore, for any object that enters the space and moves in the space, a location thereof may be definitely recorded. In the Light Coding technique, emitting the laser light so as to encode the to-be-measured space corresponds to generation of the laser speckles.
However, the depth camera is not yet available to all at present, and the depth image obtained thereby is not precise enough and is merely suitable for recognizing extremities. If it is desired to recognize a hand gesture by utilizing the aforementioned depth image, each finger of a hand may not be recognized when the hand is slightly away from the depth camera, such that the conventional depth camera is hardly a good solution for hand recognition.