Field of the Invention
The embodiments of the present invention relate to a user interface method and device, and more particularly, to a user interface method and device that are capable of more easily controlling the user interface device using a multimodal interface.
Discussion of the Related Art
As speech recognition and speech synthesis technologies have been developed in recent years, there has been increasingly a necessity for multimodal interface using a voice and other additional input units in terminals, such as a portable terminal, a home network terminal and a robot.
The term multimodal may be defined as referring to a plurality of modalities. Multimodal channels may be channels converted by a mechanical and/or electrical device through modeling of sensing channels, such as sight, hearing, tactile sensation, taste and the sense of smell, of a human being. Also, synthesizing and exchanging of the respective modalities may be referred to as multimodal interaction.
Meanwhile, speech recognition is a process to map an acoustic speech signal into text using a computer. Specifically, speech recognition is a process to convert an acoustic speech signal obtained by a microphone (e.g., during a telephone conversation) into a set of words or a sentence. Results of the speech recognition may be used in applications, such as a command, a control input or parameter, data input (e.g., document preparation), etc. Also, results of the speech recognition may be used in an application, such as speech comprehension, as an input of a language processing function. Speech recognition technology enables natural or almost natural communication between a human being and a computer.
Speech synthesis is a process to automatically generate a speech waveform using a mechanical and/or electrical device, an electronic circuit or a computer. Text-to-speech (TTS) is a technology to analyze and process data input in a text form and to convert the data into a voice.
A user generally presses a keyboard, a keypad or a touchpad with hands to input a control command, thereby controlling a terminal, such as a cellular phone, a personal computer (PC) and a tablet PC.
For example, when a message is received by a cellular phone, a user presses a touchpad or a keypad of the cellular phone to view text or write a reply. In order to control the terminal in response to an event detected in a terminal, such as receipt of text, a user generally uses an input method using hands.
In this case, however, it may be difficult for the user to control the terminal when one or both hands of the user are not free. For this reason, there has been needed a method and device for controlling a terminal in response to an event detected in the terminal using a voice, not hands, of a user.
That is, there has been needed a method and device for efficiently controlling an event detected in a terminal using speech recognition and speech synthesis.