Developers have been trying to successfully implement vision based control in contemporary devices such as gaming consoles, computers and smart phones. Most attempts have been unsuccessful in providing a control system that is sufficiently effective to be practical for operation under all real life scenarios. Some examples of such systems are given below.
The American patent application published as US2011299737 discloses a vision-based hand movement recognition system and method thereof are disclosed. In embodiment, a hand posture is recognized according to consecutive hand images first. If the hand posture matches a start posture, the system then separates the consecutive hand images into multiple image groups and calculates motion vectors of these image groups. The distributions of these motion vectors are compared with multiple three-dimensional motion vector histogram equalizations to determine a corresponding movement for each image group. For example, the corresponding movement can be a left moving action, a right moving action, an up moving action or a down moving action. Finally, the combination of these corresponding movements is defined as a gesture, and an instruction mapped to this gesture is then executed.
The international patent application published as WO09128064 discloses a method for man machine interaction with an electronic device associated with an electronic display comprises capturing images of at least one hand positioned over an input device, tracking position or posture of the hand from the images; switching from interaction based on interaction with an input device to pointing device emulation in response to detecting a gesture performed with the hand, and emulating a pointing device based on the tracking, with the hand no longer performing the gesture.
The American patent published as U.S. Pat. No. 7,970,176 discloses a method of identifying a user's gestures for use in an interactive game application. Videocamera images of the user are obtained, and feature point locations of a user's body are identified in the images. A similarity measure is used to compare the feature point locations in the images with a library of gestures. The gesture in the library corresponding to the largest calculated similarity measure which is greater than a threshold value of the gesture is identified as the user's gesture. The identified gesture may be integrated into the user's movements within a virtual gaming environment, and visual feedback is provided to the user.
The British patent application published as GB2474536 discloses how a user controls an electronic device (TV, DVD player, PC, mobile phone, camera, STB) based on computer vision. Image sensor captures a sequence of images of field of view. Processor receives the sequence of images; detects movement of at least one object in the images; applies a shape recognition algorithm (such as contour detection) on the at least one moving object; confirms that the object is a user hand by combining information from at least two images of the object; and tracks the object to detect control gestures for controlling the device. Shape recognition may be applied together with or before movement detection. In a first stage, an initializing gesture, such as a wave like movement, may be detected. In poor lighting conditions a user hand may be identified based mainly on movement detection. User hand gestures may control cursor movement and operation, select and manipulate objects (e.g. icons), or provide button click emulation, e.g. mouse click commands. Image sensor may be a 2D camera such as a webcam or a 3D camera and may be integrated with or external to device or IR sensitive.
The gesture identifications provided by such systems are simply too slow to be effective.
Furthermore, the prior art does not take into account that the camera, especially in a mobile device, may not be aligned perfectly with a user, which could result in a wrongful interpretation of a gesture.
Another major disadvantage is the complexity of the calculations involved in the prior art systems which require vast computational resources.
There is thus a need for a manner of identifying a gesture performed by an object in a video stream that is able to accommodate for misalignment between camera and user.
Furthermore there is a great need for a manner of tracking an object in an image stream that does not require vast computational resources.