The present invention relates generally to interactive television. More particularly, the invention relates to a system and method for controlling interaction with the television using speech, whereby each user of the system may have a set of predefined preferences that are automatically selected through identification/verification of the speaker""s voice.
Interactive television promises to provide a wealth of information content that boggles the mind. Current electronic program guide technology is ill-equipped to support the interactive modality. On-screen electronic program guides and push button remote control devices offer a far too complex and cumbersome user interface for selecting a program the user wishes to watch or record. It can take several minutes to scroll through or navigate through an on-screen program guide display; and pushbutton remote controls have expanded to include so many buttons that they are virtually useless, particularly in a darkened room.
Speech technology offers a way out of the current complexity dilemma. Speech can be a natural way to interact with a system, by narrowing the myriad of possible program selections to a more manageable number that can then be selected by further speech or through more conventional on-screen display and remote control pushbutton techniques.
While great strides have been made in speech technology, the truly natural, interactive environment for interactive television has yet to be achieved. A natural, intuitive interaction between user and consumer product such as the interactive TV, requires more than just good speech recognition. Natural interaction requires a sense of context, so that the semantics or underlying meaning of a user""s spoken commands will be properly understood. Different people express themselves in different ways, thus these differences need to be taken into account for a good understanding of the user""s spoken instructions.
The present invention addresses this concern through a unique system that ascertains the identity of the speaker when that speaker first addresses the system with an appropriate wakeup command. The command can be a polite word, such as xe2x80x9cplease,xe2x80x9d that is uttered when the user first wishes to use the system. A speaker verification/identification module within the system identifies the user""s identity, based on his or her speech and then invokes a pre-defined or pre-stored set of user preferences. These preferences guide further interaction between the user and the system, making the system appear more natural to the user and simultaneously increasing the system""s ability to understand the semantic content of the user""s instructions.
The user preferences may store a diverse range of information, such as which channels the user is able to access (in this way adult channels can be blocked from children), customized dialogs that the system will use for that user, system usage log information, recording what that user has previously viewed, and a set of adapted speech models that will allow the recognizer of the system to do a better job in recognizing that user""s utterances. The usage log may be used, for example, to learn the user""s viewing preferences, thereby assisting the system in understanding the user""s current request. The log may also be used to monitor a child""s use of the television, thereby limiting the child""s viewing to a pre-defined duration.
For a more complete understanding of the invention, it objects and advantages, refer to the following specification and to the accompanying drawings.