Existing computer interface systems are able to support increasingly natural and complex user input. Handwriting and speech are typical examples of complex input, however contemporary gaming consoles are now able to detect user movements and interpret the movements as input. The Kinect™ for Microsoft's Xbox 360® uses camera and audio technology to sense input, without the need for a controller.
At present, known systems do not have the ability to handle multimodal input that may need to change in real-time according to the user's needs. Moreover, such systems cannot simultaneously leverage multiple input modalities in order to accurately interpret the user's intent. For example, while the system may be able to accept speech as input when running a speech application, or touch, or gesture, there is no way for existing systems to capture and interpret these modes together to act or disambiguate a user's request, command, or intent.