The basic objective of Human Computer Interaction (HCI) is to improve the interaction between users and computers by making computers more usable and receptive to user's needs. Furthermore, HCI seeks to design a system which would decrease the hurdles between the human's action instructing a specific task to be accomplished and the computer understands of the same. HCI, using visual information as an input, has wide applications ranging from computer games to control of robots. The main advantage of using visual information as an input is that it makes communication with the computer possible from a distance without the need of any physical contact. Visual information comprising of movement due to skeleton points is chiefly beneficial when the environment surrounding the user is noisy, where speech commands would prove to be less cognizable. On the other hand, speech commands are beneficial when the user is visually impaired or is incapable of offering hand gestures as an input to the computer.
At present lot of system and methods are available for enabling the interaction of user with that of compute or machine. Most of them use either visual gesture for controlling or interacting with the machine or uses direction of sound by which user is detected. Although all these methods have made the HCI easier but there are numerous challenges with these current Human Computer Interaction methodologies. The individual mode of interaction using either visual or just speech input is less accurate. The existing vocabulary or dictionary for visual, sound and speech gestures is inadequate. In addition, as the number of gestures increases, the recognizing capability of the gestures by the classifier is reduced. Also, in the case of skeleton based tracking of human postures for detection of gestures, there is a difficulty in tracking the skeleton points when they come close to each other. Moreover, when there are multiple users, the computer may erroneously register a controlling user. Thus, the recognition accuracy of a controlling user reduces in the case of a multi-user system. Also, no work has been done for combining or fusing the directionality of sound or speech simultaneously with visual or touch gestures to create a multimodal gesture command.
Thus, there is a need for creating an intuitive gesture set combining the directionality of sound or speech simultaneously with visual or touch gestures to achieve accuracy in the interaction between humans and computers and to provide a solution of recognizing user in control amongst one or more users.