Speech is considered as one of the most important signals for human-human interactions, the perception of which is not limited to signals from the auditory domain alone. During a face-to-face interaction, the percept is formed when the auditory and visual systems process the cues presented to them in a synchronous fashion. Studies have indicated that perception of speech is faster when visuals are available, thereby indicating a strong correlation between visual and auditory domains when speech perception is considered. Influence of visual cues on interpretation of various aspects of speech such as prosody, speaking rate, co-articulation, lexical tone, emotions and so on have been explored. In addition, there is a co-relation between head and eyebrow movement and different prosodic conditions. Variations in speaking rate are considered more prominent in the visual domain than in the acoustic domain.
Research has indicated that visual cues are a useful tool to detect emotion. There has been a lack of sufficient visual-speech related work from the perspective of assistive technology for hearing impaired, which demands high accuracy of the visual representation of speech and is one of the embodiments of the disclosed Prior art in this domain that comprises models which produce synthetic visual speech after receiving an input containing speech information. The present disclosure has been elaborated as a technique that increases the accuracy of interpretation and also provides a real time visual impression and implementation of speech features into an animated model.