A conventional speech recognition technique is implemented by creating a program. However, in recent years, speech recognition is implemented using a hypertext document such as VoiceXML or the like. VoiceXML basically uses speech alone as input/output means (user interface) (strictly speaking, DTMF or the like is also used). Japanese Patent Laid-Open Nos. 2001-166915, 10-154063, and the like use a hypertext document to describe a user interface which uses speech input/output and GUI (Graphical User Interface) together. To describe this GUI, a hypertext document such as HTML or the like is used. Furthermore, in order to allow speech input/output, some tags corresponding to speech input and output are added.
A so-called multimodal user interface that uses a GUI and speech input/output together requires a description about cooperation among respective modalities such as speech input by means of speech recognition, speech output by means of speech synthesis, graphical presentation of user's inputs and information by means of a GUI, and the like. For example, Japanese Patent Laid-Open No. 2001-042890 discloses a method in which buttons, input fields, and speech inputs are associated with each other, an associated input field is selected upon depression of a given button, and a speech recognition result is input to the selected field.
In consideration of inputs to input fields on a Web or dialog application, a field to which an input from a keyboard or the like is input must be presented to the user. In general, a currently input-enabled field is distinguished from other fields by focus emphasis. In consideration of input by means of speech, it is required to emphasize a field to which data is to be input. Since speech recognition readily causes recognition errors, it is required to decrease user's utterance errors by presenting to the user an utterance example of data to be input in an input field.
The present invention has been made in consideration of the aforementioned problems, and has as its object to allow smooth input of data to an input field when data is to be input to the input field by a speech input. It is another object of the present invention to specify an input field and a speech recognition result to be input to that input field.