Multimodal interaction is mostly the natural interaction humans use to interact with the world and his surroundings in general and among humans in particular. Multimodal interaction may employ various human senses, for example, visual interaction, text interaction, and/or voice interaction, tactile interaction. The multimodal interaction may include one or more interaction types sequentially or in parallel to, for example, express needs, share information, explore options and the likes. Multimodal interaction is known to provide a rich interaction information environment where one or more human senses are used to interpret interaction with other people. For example, facial expressions, body language and/or voice intonation, may provide a lot of information to a person while communicating with one or more other people in addition to the actual contents of the verbal language. Human machine interaction (HMI) on the other hand is traditionally confined to unimodal or limited multimodal at best, for example, using switches, buttons, keyboard and/or pointing devices for inputting data to the machine and receiving from the machine text, visual objects displayed on a screen and/or audio playback. Bringing the wealth of information available by the human multimodal interaction to the HMI environment may provide major benefits, for example, improve the accuracy of the interaction interpretation by analyzing multimodal data generated by a plurality of senses, support hands free interaction, eliminate and/or reduce the need for intermediate devices, such as keyboard, pointing device and/or touchscreen and/or improve HMI for limited accessibility people.