Companion robots advantageously can establish an emotional relationship with human beings. Dynamic adaptations of dialogs through voice or dialog skins can enable rich interactions.
Existing systems for speech or voice synthesis are mostly passive and uniform: beyond a few options like man or female voice choices, the tone of the speech generation engine is rather neutral. What is more, provided responses lack cultural references. The objective of industrial or mass market voice answering systems precisely is to provide universally accepted responses, i.e. to be as widely understood as possible. This implies to avoid any contextual and a fortiori cultural references. Voice commands are generally limited to specific contexts. For example, voice dictation software is mostly used in the context of a standalone software application (for example Word processing software). According to some accessibility features increasingly provided with modern operating systems, users can use voice commands to perform certain actions (for example launching an application, copy and paste, etc). These predefined actions are rather limited. Such visual or audio interaction modes are generally passive (e.g. users are actively giving orders and the machine executes the orders). Even with recent computer interaction models, such as those implemented in answering systems for example, limited interactions occur from the machine to the user.
In the context of a companion humanoid robot, the interaction model with human users significantly changes when compared with the interaction model with personal computers (and their different forms). The cognitive interaction with a robot is fundamentally different than the one with a tablet PC or a smartphone. In particular, the ability to modulate speech synthesis of the robot can be beneficial if not key to a rich interaction, which in turn can allow to gather relevant data and to improve the services rendered by the robot or connected devices.
There is a need for methods and systems of handling voice synthesis (form) and associated interactive dialogs (substance), in particular in the specific context of a conversation between a robot and a human user.