Users who operate electronic devices have become increasingly reliant on the electronic remote control. The electronic remote control permits that user to issue commands from a distance, issuing commands to the electronic devices through infrared (IR) and radio signals.
In a typical home, one or more electronic devices, such as a television, cable TV receiver, CD player, video recorder, DVD player, audio receiver, computer systems and even lighting, can be controlled using remote controls. In fact, many electronic components rely on commands through remote controls, where the remote can only access electronic device preferences. Although electronic remote controls have become very complex in nature, their use has become evermore popular, especially since many remotes have created further accessibility to live media. Many electronic consumers have a stronger desire to increase interactivity with all forms of multimedia, especially the television.
Electronic consumers have long desired increased interaction and participation with multimedia. Eliminating an electronic remote would be no exception. Using human body gestures to command electronic devices has been discussed for years in science fiction. However, with advances in gesture recognition, human gestures have proven worthy in issuing commands to electronic outlets.
Gesture recognition technology allows users to interact with electronic devices without the use of other mechanical devices, such as an electronic remote control. This technology usually includes a camera that reads the movements of the human body and communicates the data collected from the camera to a computer. The computer then recognizes a selected gesture as a intended command for the electronic device. For instance, in practice, the user can point a finger at a television or computer screen in order to move a cursor or activate an application command.
An interactive media system is disclosed in U.S. Pat. No. 7,283,983, which teaches a computer coupled to a video camera to provide a method for utilizing imaging and recognition techniques to provide augmented interaction for a human user in conjunction with use of printed media such as books, educational materials, magazines, posters, charts, maps, individual pages, packaging, game cards etc. The computer system uses a vision-based sensor to identify printed media and retrieve information corresponding to that view. The sensor then identifies a first user gesture relative to, at least, a portion of the media. The computer system then interprets the gesture as a command, and based at least in part on the first gesture and the retrieved information, the system electronically speaks aloud at least a portion of the retrieved information.
Human gestures can originate from any bodily motion or state, including the hand movement described above. Facial recognition can further assist a motion detection system by distinguishing where those gestures come from, and filtering out non-relevant movement.
Although humans have the innate ability to recognize and distinguish between faces, it has been quite difficult to employ that same intrinsic capability into computer software. However, in the past few years, the systems have become better developed.
Facial recognition, used with computer systems, permits the identification and verification of a person from a digital image or video source. Since the human face has numerous, distinguishable characteristics, comparison of these characteristics may be utilized for identification of a person. Using algorithms, computer software can compare characteristics, such as the distance between the eyes, depth of eye sockets, shape of cheekbones, as well as many other facial features, and then compare each feature with existing facial data.
U.S. Pat. No. 6,377,995, issued to Agraham et al., provides a method and apparatus for indexing multi-media communication using facial and speech recognition, so that selected portions of the multi-media communications can be efficiently retrieved and replayed. The method and apparatus combine face and voice recognition to identify participants to a multicast, multimedia conference call, which can include data or metadata. A server determines an identity of a particular participant when both the audio and video face patterns match speech and face models for particular participants, and then creates an index of participants based on identification of speech and face patterns of the participants, whereby the index is used to segment the multimedia communication.
Depth-awareness cameras are widely available and used to control media, as well. Video pattern recognition software, such as the Sony Eyetoy and Playstation Eye, utilize specialized cameras to generate a depth map of what is being seen through the camera at a short range, allowing a user to interact with media using motion, color detection and even sound, using a built-in microphone.
U.S. Pat. No. 6,904,408 issued to McCarty et al. teaches a web content manager used to customize a user's web browsing experience. The manager selects appropriate on-line media according to a user's psychological preferences, as collected in a legacy database and responsive to at least one real-time observable behavioral signal. Skin temperatures, pulse rate, heart rate, respiration rate, EMG, EEG, voice stress and gesture recognition are some of the behavioral responses and psychological indicators are measured and analyzed. Gesture recognition is accomplished by computer analyses of video inputs. The position of the face may indicate an upbeat or downbeat attitude, where the count of blinks per minute may be used for indicating anxiety.
Gesture recognition has proven advantageous for many applications. However, gesture recognition has many challenges, including robustness and accuracy of the gesture recognition software. For image-based gesture recognition there are limitations associated with the equipment and the amount of noise found in the field of view. Unintended gestures and background movement hamper full recognition of issued commands.
There has been a need to control media content, especially using human gestures. However, previous approaches have employed gesture recognition techniques that are not robust.