1. Technical Field
The invention is related to controlling electronic components in a ubiquitous computing environment, and more particularly to a system and process for controlling the components using multimodal integration in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem associated with the pointing device, are combined to determine what component a user wants to control and what control action is desired.
2. Background Art
Increasingly our environment is populated with a multitude of intelligent devices, each specialized in function. The modern living room, for example, typically features a television, amplifier, DVD player, lights, and so on. In the near future, we can look forward to these devices becoming more inter-connected, more numerous and more specialized as part of an increasingly complex and powerful integrated intelligent environment. This presents a challenge in designing good user interfaces.
For example, today's living room coffee table is typically cluttered with multiple user interfaces in the form of infrared (IR) remote controls. Often each of these interfaces controls a single device. Tomorrow's intelligent environment presents the opportunity to present a single intelligent user interface (UI) to control many such devices when they are networked. This UI device should provide the user a natural interaction with intelligent environments. For example, people have become quite accustomed to pointing at a piece of electronic equipment that they want to control, owing to the extensive use of IR remote controls. It has become almost second nature for a person in a modern environment to point at the object he or she wants to control, even when it is not necessary. Take the small radio frequency (RF) key fobs that are used to lock and unlock most automobiles in the past few years as an example. Inevitably, a driver will point the free end of the key fob toward the car while pressing the lock or unlock button. This is done even though the driver could just have well pointed the fob away from the car, or even pressed the button while still in his or her pocket, owing to the RF nature of the device. Thus, a single UI device, which is pointed at electronic components or some extension thereof (e.g., a wall switch to control lighting in a room) to control these components, would represent an example of the aforementioned natural interaction that is desirable for such a device.
There are some so-called “universal” remote controls on the market that are preprogrammed with the known control protocols of a litany of electronic components, or which are designed to learn the command protocol of an electronic component. Typically, such devices are limited to one transmission scheme, such as IR or RF, and so can control only electronic components operating on that scheme. However, it would be desirable if the electronic components themselves were passive in that they do not have to receive and process commands from the UI device directly, but would instead rely solely on control inputs from the aforementioned network. In this way, the UI device does not have to differentiate among various electronic components, say by recognizing the component in some manner and transmitting commands using some encoding scheme applicable only to that component, as is the case with existing universal remote controls.
Of course, a common control protocol could be implemented such that all the controllable electronic components within an environment use the same control protocol and transmission scheme. However, this would require all the electronic components to be customized to the protocol and transmission scheme, or to be modified to recognize the protocol and scheme. This could add considerably to the cost of a “single UI-controlled” environment. It would be much more desirable if the UI device could be used to control any networked group of new or existing electronic components regardless of remote control protocols or transmission schemes the components were intended to operate under.
Another current approach to controlling a variety of different electronic components in an environment is through the use of speech recognition technology. Essentially, a speech recognition program is used to recognize user commands. Once recognized the command can be acted upon by a computing system that controls the electronic components via a network connection. However, current speech recognition-based control systems typically exhibit high error rates. Although speech technology can perform well under laboratory conditions, a 20%-50% decrease in recognition rates can be experienced when these systems are used in a normal operating environment. This decrease in accuracy occurs for the most part because of the unpredictable and variable noise levels found in a normal operating setting, and the way humans alter their speech patterns to compensate for this noise. In fact, environmental noise is currently viewed as a primary obstacle to the widespread commercialization of speech recognition systems.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [2, 3]. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.