Multimodal human computer interactions (HCI) systems have inherent advantages over computing systems that use only keyboard and mouse as inputs. There has been considerable interest in the space of multimodal HCI due to its naturalness, intuitiveness, and closeness to human-to-human interactions. Gesture and speech are some of the modalities that have been looked at extensively in multimodal HCI.
While using speech as a modality, a major problem encountered is distinguishing speech commands directed at the computing system, from ambient user interactions which are not directed at the computing system. Therefore, drawing attention of the computing system to the speech commands only when they are being directed at the computing system is a very important aspect of the multimodal HCI. In such a way, ambient user interactions which are not directed at the computing system can be rejected. In human-to-human interactions and communications, a number of methods like tapping, establishing eye to eye contact and the like are used to draw the attention of each other, before directing their speech or gesture at them.
In one workaround that aims to distinguish commands directed at the computing system from ambient user interactions, it is considered that all non-command gesture-speech ceases before and while a multimodal gesture-speech command is being issued. Another workaround uses all gesture-speech commands that are designed to be only those that are not used in natural gesture rich communication. Yet another workaround uses a specific gesture or speech as a cue to indicate the starting of an interaction with the computing system, for example calling out the name of the computing system. However, these workarounds are not user friendly or robust to distinguish the multimodal HCI from the ambient human-to-human communications which take place in and around the computing system.
The design of intuitive and natural multimodal HCI, inspired by human-to-human communication methods aims to make multimodal HCI as natural as possible. However, this poses the challenge of making the multimodal HCI distinctive enough for the computing system to be able to distinguish multimodal commands directed at it from ambient human-to-human interactions.
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way.