Traditional gamepads, joysticks, mice and other remote control devices allow users to command input to video games, personal computers, Internet Protocol TV (IPTV) and other electrical devices. Today, controlling most home environment interaction systems through manual manipulation requires holding a control device.
Traditional hand-held control devices are complex, cumbersome and often demand some form of manual key-based input. Furthermore, location misplacements, battery replacement and non-sterile hand-held remote control devices as well as the effort invested into direct manual remote control methods are limitations of these methods. Hand held devices do not allow flexible, natural or intuitively expressive means of controlling home environment interaction systems.
With the advent of super-fast computing systems and highly efficient digital imaging systems, the field of computer vision based man-machine interaction has undergone a period of significant technological advancement. Simple motion detection systems, where motion triggers a response from a machine (e.g., surveillance systems); to highly complex three-dimensional imaging sign recognition systems have been the subject of significant development in the last few years.
The conventional techniques utilized for sign and gesture recognition generally require hand makers, fixed color backgrounds or special equipment such as gloves to aid the machine vision system in finding the source of the sign or gesture. These approaches also utilize the tracking of hand motion which is highly unreliable under varying lighting conditions.
Other systems allow users to wave a wand-shaped interface fitted with motion sensors, as a means of sending a command to an electrical system. However, such systems do not provide means for inputting commands based upon bare hand manual gestures. Waving a wand-shaped interface denies the user from making natural and comfortable movements. This is because most of these systems are based on accelerometers which for robust recognition require that the wand is held in a non-twisted level position.
Most dynamic free hand gesture control methods are designed to use visual feedback such as the position of a graphical object (for example a mouse curser), or an image of the users hand located on a screen (and hence it's positional location). Once the operator visually ascertains that location is correct, a selection gesture can be made. Such location based visual feedback systems require the design of active areas on the screen and therefore, reduce the flexibility of their use. For example, entertainment systems such as TV and music players should not require direct feedback of the users current hand position, nor require modification of the apparatus to admit visual feedback cues.
Gesture recognition of dynamic gesture movement requires “real time” tracking of the hand. There are a number of methods for tracking an object using motion information. Methods such as “optical flow” and partial filtering are computationally intensive, precluding their use in real time systems. Kalman filters require Gaussian assumptions, and rather smooth movements to operate effectively. Predictive methods fail when gesture trajectories are abruptly changed as, for example, in gesturing the letter “M”. Fast methods for tracking an object within a sequence of video frames use motion detectors employing either “frame differencing” or “background subtraction”. “Frame differencing” subtracts one frame image from the preceding one.
The result is an image of all pixels whose intensity change is induced by the displacement of an object in the image scene. “Frame differencing” suffers from loss of the object when the object's movement has stopped. This problem can be avoided by the method of “background subtraction”. “Background subtraction”, subtracts a pre-stored image of the background (without the presence of the desired object) from the current image. Any object in the image not present in the background image appears in the resulting image, and is thus extracted and surrounded by a tracking window. Thus, successive “background subtraction” can successfully track on object even when it remains stationary. The difficulty that the method of “background subtraction” has with segmenting a desired object from its background, is that it will also detect unwanted motion of other objects moving in the scene that were not there when the background image was initially captured. One method of overcoming this is to employ a “background maintenance” procedure. Here the background reference image is continually updated to contain new objects as they appear. The background update operation must be performed when the object of interest is not in the new reference image. “Background maintenance” requires that the tracking calculations be interrupted when the object of interest appears in the scene. This will cause delays and decrease the robustness of the tracking performance.
To avoid some of these problems several systems employ color properties for image analysis. Using color videos and assuming the hand is uncloaked, the gamut of skin colors of the typical hand is stored and used to detect pixels that match one of its colors. One of the problems with using only skin color to detect the hand in an image, is the presence of other objects or pixels in the scene that also have skin color such as the face, a t-shirt, an item of furniture or just random pixels on a wall.
Today, gesture recognition systems impose constraining assumptions such as; uniform background, fixed illumination, no motion other than that of the hand in the scene, high quality or multiple cameras, high processing power, etc. These constraints are required for accurate capture of the hand's shape, the movements of the hand's blob pixels, the hand's contour, etc. These constraining requirements allow accurate identification and location of the hand by matching its shape to stored templates. Traditional classifiers that are utilized by those systems, such as Hidden Markov Models and Neural Networks require relatively “clean” trajectory traces or extensive preprocessing. Therefore, these systems can not operate with low-grade equipment, are relatively expensive and require a large number of gesture samples for training the classifier.
Currently available gesture recognition remote control systems do not provide gesture and sign recognition for remote control of an IPTV that can analyze hand movements without using special background, gloves, hand-held devices or visual feedback. Prior art methods suffer from the problem of how to fuse motion and color signals, and can not be utilized with low-grade equipment (such as typical Internet Cameras). Therefore, while trying to provide a gesture control system which includes many commands (20 or more), these systems have become very complicated both in terms of the computational requirements and in terms of the high level equipment which is necessary. Still, the IPTV does not require recognition of a large command set, and generally, 5 to 8 commands are sufficient.
It is therefore, an object of the present invention to provide a system for gesture recognition for remote control of an IPTV.
It is another object of the present invention to provide a system for gesture recognition for remote control without using location based visual feedback.
It is another object of the present invention to provide a system for gesture recognition based upon bare hand manual gestures.
It is yet another object of the present invention to provide a system for gestures recognition operating in varying light conditions.
It is still an object of the present invention to provide a system for gesture recognition which reduces the probability of unintentional operations.
It is still an object of the present invention to provide a system for gesture recognition utilized with low-grade equipment.
It is still another object of the invention to provide a gesture control system which although includes a relatively small number of commands, it still can satisfactorily be adapted to control most IPTV features.
Other objects and advantages of the invention will become apparent as the description proceeds.