1. Field of the Invention
The invention relates to a method of enabling a user to interact with a consumer electronics system, the behavior of the system being modeled by a plurality of dynamically changing system elements, such as, system states or variables.
The invention also relates to a consumer electronics system, the behavior of the system being modeled by a plurality of dynamically changing system elements, such as, system states or variables.
2. Description of the Related Art
Consumer electronics products are getting increasingly complex. This makes the interaction of a user with the products/systems more difficult. Nowadays, much of the functionality of the product is implemented in software and, as such, the behavior of the system is, to a high degree, not directly visible to the user. In many situations, it is required that the user be informed about the behavior of the system in order to adequately interact with the system. It is, therefore, desired that certain system elements, representing, for instance, states and variables of the system, are represented to the user. In particular, for voice-controlled systems, it is desired that, in addition to the system elements relating to the normal behavior of the system, also some of the elements of the voice control/speech recognition are represented. This results in even more elements needing to be represented to the user. Conventionally, different system elements are each represented using a different user interface element, such as, different display windows, or using different textual or graphical objects in a window. In this way, many user interface elements may be presented simultaneously, or may be accessible via hierarchically arranged menus. The large amount of information presented to the user makes it difficult for the user to quickly assess the total behavior of the system. It also requires a large display area or, if hierarchical menus are used, provides a less direct access to information.
The Microsoft ActiveX technology for interactive agents discloses that an agent, such as an anthropomorphic character, can represent a system event. A PC application program can send an event to the ActiveX layer. This layer executes an animation corresponding to the event. In this system, a strict one-to-one coupling exists between an event and an animation. If several events need to represented to the user at the same time, several animations in parallel need to be invoked. As described before, this makes it difficult for the user to quickly assess the overall behavior of the system. Moreover, a lot of system resources are required.
In user controlled systems, it is generally desired to give feedback to the user on issued commands. It is also desired that such feedback is intuitive so that it can easily be interpreted by the user and does not require any learning. The feedback should be given in such a way that people can easily understand and predict the system behavior. However, in voice-controlled systems, various types of feedback must be presented, often at the same time. These types of feedback are, among others, the time-span (i.e., the period the voice control unit is trying to recognize and interpret voice input), the time-span left, whether the user is heard, whether the user is understood, whether the user spoke a valid command, whether the user used a sensible command, which user is recognized (multi-user situation), whether the system is going to execute the command, and whether the system is busy. The conventional approach of presenting all of this information to the user can easily overwhelm the user and result in an ineffective interaction between the user and the system.
It is an object of the invention to provide an improved method of interaction between a user and a consumer electronics system. It is a further object to provide a consumer electronics system with an improved interaction with the user.
This object is achieved in a method of enabling a user to interact with a consumer electronics system, where the behavior of the system is modeled by a plurality of dynamically changing system elements, such as system states or variables, by presenting the plurality of system elements to the user, the method including the steps:
choosing one appearance from a plurality of different visual and/or auditive appearances of one object in dependence on actual values of the plurality of system elements; and
presenting the chosen appearance to the user.
By using only one object to represent several system elements, the user can assess the overall behavior/state of the system by focussing on only one object. Moreover, the loading on resources, such as, display area, processing power, memory, etc., is kept low.
In a particular embodiment of the invention, the object is preferably an anthropomorphic character having the ability to express many elements. Particularly, for a system which allows speech input, the power to express independent system elements at the same time is very beneficial.
Voice control as an interaction modality for (consumer) products is getting more mature. However, people perceive it as strange, uncomfortable or even unacceptable to talk to a product, e.g., a television. To avoid that conversations or utterances not intended for controlling the products are recognized and executed, most voice controlled systems require the user to activate the system (resulting in a time span during which the system is activated). Such an activation may be performed via voice, for instance by the user speaking a keyword, e.g., xe2x80x98TVxe2x80x99. By using an anthropomorphic character, it is more natural to address the character (instead of the product), e.g., by saying xe2x80x98Belloxe2x80x99 to a dog-like character. This removes a barrier in the interaction. Moreover, such a system can make effective use of one object with several appearances, chosen as a result of several state elements. For instance, a basic appearance (e.g., a sleeping animal) can be used when the system is not yet active. A second group of appearances can be used when the system is active (e.g., awake appearances of the animal). The progress of the time span can then, for instance, be expressed, by the angle of the ears (fully raised at the beginning of the time span, fully down at the end). The same group of appearances can also express whether or not an utterance was understood (an xe2x80x98understanding lookxe2x80x99 versus a xe2x80x98puzzled lookxe2x80x99). Also audible feedback can be included, for example, a xe2x80x98gladxe2x80x99 bark if a word has been recognized. A user can quickly grasp the feedback on all such system elements by looking at the one appearance which represents all these elements (e.g., raised ears and an understanding look, or lowered ears and a puzzled look).
In general, recognition errors still occur in voice-controlled systems, e.g., nothing is recognized even though the user did speak a command, or something different is recognized. Users perceive such interaction difficulties as quite unacceptable, after all, it is a product and shouldn""t fail. By using an anthropomorphic character, the user will automatically associate the level of quality to be expected during the interaction with the kind of character chosen for the interaction. By choosing a character like a dog, the user will much more easily accept that some commands are not understood. It is quite normal that a command needs to be given several times to a dog or needs to be rephrased.
The system can already have stored therein a set of appearances derived from the basic object. Any suitable form for selecting an appearance from the set may be used. For instance, tables may be used to map N-system elements to one value that specifies one of the appearances in the set. Alternatively, a weighting mechanism may be used, where, for instance, a formula with the N system elements as input parameters produces one descriptor for an appearance. Advantageously, a fuzzy logic algorithm may be used.
Whenever one of the system elements to be represented changes, a new appearance (representing all elements) is generated. Such a generation may start from a basic appearance. Preferably, the currently presented appearance is modified. In general only one system element changes at a time. A system element preferably relates to one distinct aspect of the object. For instance, the remaining time span in a voice-controlled system is expressed via the angle of the ears of an animal. A change of value of that one element results that only the corresponding aspect is changed. This can be done by selecting a new appearance from a set of appearances. By using suitable graphical engines, it is also possible to modify only that one aspect of the currently presented appearance. For instance, a xe2x80x98neutralxe2x80x99 mouth (substantially horizontal) can be changed to a glad expression (curled mouth corners) when the system has recognized a voice command. By only locally changing that one aspect, other aspects of the object, reflecting other system elements, can stay the same, if so desired. For instance, as long as the volume of the speech is relatively low (but still high enough to recognize a word), in all presented appearances the character could hold his hand near his ear, while the mouth can be changed to reflect whether or not a command has been recognized, and the ear angle could reflect the remaining time span.
The modification of the object can relate to all kinds of aspects. It may relate to the size or shape of the appearance, as well as to graphical attributes of the appearance, such as, brightness, color, or opacity. With state-of-the-art graphics engines, also the expression, such as, the facial or body expression, of an appearance can be changed. In addition to or instead of changing visual aspects of the object, also audible aspects can be changed, such as, volume of audible output or the prosody (e.g., a rising pitch if an unexpected command has not been recognized).
Advantageously, the appearance is animated. This makes it easier to draw the attention of the user to an important system element reflected at that time by the appearance. It is preferred that the animation is synchronous to change of the variable. For instance, the drop in ears is synchronized to the progress of the time span. In a situation where the changes in system elements are shown real-time by modifying the appearance, it is preferred that the engine performing the modification is informed of each change of the variable.