The present invention generally relates to voice-controlled devices and more specifically relates to controlling synthesized speech output therefrom.
Speech recognition systems are rapidly increasing in significance in many areas of data and communications technology. In recent years, speech recognition has advanced to the point where it is used by millions of people across various applications. Speech recognition applications now include interactive voice response systems, voice dialing, data entry, dictation mode systems including medical transcription, automotive applications, etc. There are also “command and control” applications that utilize speech recognition for controlling tasks such as adjusting the climate control in a vehicle or requesting a smart phone to play a particular song.
Voice controlled devices, such as robots, smart speakers, intelligent personal assistants are typically placed in an environment where there are humans. Other voice controlled devices may also be present. These voice controlled devices face several obstacles about when to speak or respond. For example the voice controlled devices may hear an utterance and not know for sure whether the utterance is directed to it or not. Likewise the voice controlled devices may be in the presence of one or more people and not know whether or not it is appropriate to start up a conversation. Further the voice controlled devices may start speaking and while speaking it may discern a human or another voice controlled device speaking. Given these obstacles, the voice controlled device may not know whether the right thing is to continue or stop speaking Voice controlled devices have been very poor at properly sizing up such situations and dealing with them effectively.