There has been much research on improving the overall experience of human-computer interaction. Multi-modal affective computing, or the automatic extraction of human emotion using multiple input modalities, is a field that is revolutionizing human computer interfaces (for example, see Afzal et al., “Intentional affect: an alternative notion of affective interaction with a machine,” Proc. 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology, pp. 370-374, 2009). In an article entitled “Human-computer intelligent interaction: A survey” (4th IEEE International Workshop on Human-Computer Interaction, pp. 1-5, 2007), Lew et al. argue that in order to achieve effective human-to-computer communication, as the human interacts with the computer, the computer also needs to interact with the human. The goal of human-computer interaction is twofold: to have the computer engage and embrace all the human subtleties, that as a whole, convey the true underlying message; and to interact with the human in his/her natural setting, eliminating ambiguous or awkward input modalities.
Computers are becoming ubiquitous and are increasingly diffusing into our environment, moving from primarily foreground devices requiring purposeful user interactions (e.g., using a mouse or a keyboard) to invisible background devices. Next generation computing devices will need to increasingly interact with humans in a way that is very similar to human-to-human communication.
With the introduction of low-cost depth cameras, such as those associated with the Kinect game console available for the Xbox 360 gaming system from Microsoft Corporation of Redmond, Wash., depth estimation has become a viable option for widespread use. Depth information provides much more salient information than RGB or grayscale cameras for subject gesture recognition. The extraction of objects against complex backgrounds, and the tracking of these objects has been reduced from a highly compute-intensive, error-prone task to one that is much more robust and works with much simpler methods, spurring a revolutionary leap in machine understanding (see Shotton, et al., “Real-time human pose recognition in parts from single depth images,” Computer Vision and Pattern Recognition, pp. 1297-1304, 2011).
Gesture recognition using depth cameras is now able to recognize an increasingly sophisticated dictionary of commands. Examples of gesture recognition methods are described by Suma et al. in the article “FAAST: The Flexible Action and Articulated Skeleton Toolkit” (Proc. IEEE Virtual Reality Conference, pp. 247-248, 2011; and by Kaplan in the article “Are gesture-based interfaces the future of human computer interaction?” (Proc. International Conference on Multimodal Interfaces, pp. 239-240, 2009). The rapid development of numerous gesture control platforms has resulted in a plethora of application-specific, gesture-based commands. These commands have been driven by the gaming and home entertainment markets, which generally have one or two users in constrained settings.
U.S. Patent Application Publication No. 2009/0077504 to Bell et al., entitled “Processing of gesture-based user interactions,” discloses methods for extracting hand gestures for interactive displays, as well as inclusion of visible indicators on a screen, much like a mouse fiducial is used in modern day computers.
U.S. Patent Application Publication 2011/0157009 to Kim et al., entitled “Display device and control method thereof,” discloses a method for using human gestures to control a device. The method is based upon human silhouette or skeletal joint estimation of the human operator.
U.S. Patent Application Publication 2011/0197263 to Stinson, entitled “Systems and methods for providing a spatial-input-based multi-user shared display experience,” discloses a method for allowing multiple human users to control a device using gesture control. The method primarily uses hand gestures for living room TV control allowing split screen and multi-window displays, whereby each user controls each window.
U.S. Pat. No. 5,563,988 to Maes et al., entitled “Method and system for facilitating wireless, full-body, real-time user interaction with a digitally represented visual environment,” discloses a method for allowing a human user to insert oneself into a virtual reality environment. The method allows for interaction with the environment and objects within it, including the extraction of information.
As interactive devices become more ubiquitous, gesture commands which are instinctive and intuitive for humans to perform in unconstrained settings will need to be introduced. Additionally, intuitive ways for the computer to communicate back to the user will also be needed.