Along with the advances in speech-recognition-synthesis techniques, a speech information input apparatus has been put into practical application. Furthermore, an information input apparatus that combines speech and another means is also available. In such apparatus, respective means can compensate for each other's disadvantages, and can exploit each other's advantages.
As such apparatus, an interface apparatus that combines a speech input and GUI is known. By inputting information while exploiting the merits of the speech input and GUI, their disadvantages are compensated for.
More specifically, speech is a natural interface means for a human being, and makes it easy to perform input/output operations, but has no browsability. On the other hand, as GUI has browsability as output means, and it allows easy input of, e.g., menu selection as input means for browsably displaying input fields. However, with the GUI it is harder to freely input (this disadvantage is conspicuous in case of ten-key input and handwriting input).
For example, a music search system having an interface shown in FIG. 8 will be described below. This system can search for a song based on one or a plurality of artists' names, a song name, and a name of CM using that song. The GUI (screen display) is used as output means, and speech is used as input means to respective input fields.
In this case, since a screen display is made, the user can easily understand that he or she can perform a search using any of the artist name, the song name, and the CM name. Since input can be made to the respective input fields by means of speech, it is easy to input.
Speech contents input to the respective input fields are recognized using different grammars. For example, the artist name, song name, and CM name are respectively recognized using the grammars of the CM name.
When speech input and the GUI are used together, and there are a plurality of input fields, as shown in FIG. 8, and an input field corresponding to a given speech input must be discriminated.
As a method for this purpose, speech recognition is made simultaneously using the grammars for all the input fields, and an input field corresponding to the input is determined based on the obtained recognition result.
In the example shown in FIG. 8, speech recognition is made simultaneously using the grammars for the artist name, the song name, and the CM name, and if the recognition result indicates a CM name, an input to the CM name input field can be determined.
Note that the speech recognition rate normally decreases as the grammar becomes larger in scale. Hence, when grammars for a plurality of input fields are simultaneously used, the recognition rate for the speech input decreases.