More and more devices are network-enabled, and consequently new applications are produced that make use of these network-connected devices through human-computer interaction (HCI) systems. This poses many challenges and opportunities for operating these devices by normal users and assisting the users with new user knowledge-rich interaction technology in an intuitive manner.
Prior art systems intelligent dialog systems for many applications and they are mostly designed for single users at any one time. Some systems incorporate speech recognition input. However, many different environments, including indoor, outdoor, and in-vehicle environments also include a variety of sounds and other acoustic inputs that go beyond simple voice command input. Existing systems treat acoustic inputs from the environment as sources of noise and employ filters and other signal processing techniques to attenuate various non-speech sounds as noises. Additionally, traditional speech recognition systems interact with a single user at a time to operate a single device without regard for the context of the user in the presence of other individuals or in the presence of different environment conditions. Consequently, improvements to human-computer interaction systems that improve the operation in various environments and contexts with more complex interactions would be beneficial.