1. Field of the Invention
The present invention relates to a speech recognition interface system to be used as a man-machine interface in the data processing system such as personal computers, workstations, word processors, and speech (or voice) mail systems.
2. Description of the Background Art
In recent years, there has been a development of a computer device equipped with a plurality of input means such as keyboard, mouse, speech input means, and image input means, in order to facilitate a variety of command and data input modes.
Among these various input means, the speech input means has a prospect for facilitating a very natural command and data input mode, but has not been utilized widely because of the difficulties concerning the amount of calculations required for the speech processing and the recognition rate.
As a conventional speech input means, there are several propositions for a speech recognition interface system, as follows.
FIG. 1 shows an exemplary conventional configuration for the speech recognition interface system, which comprises an application program AP into which a speech recognition function SR is incorporated. In such a configuration, the speech recognition function SR cannot be separated from this application program AP, so that it has been difficult to utilize this speech recognition function SR from the program other than this application program AP.
FIG. 2 shows another conventional configuration for the speech recognition interface system, which comprises one speech recognition system SRS and one application program AP, which are connected with each other. In such a configuration, the speech recognition system SRS is exclusively used only by the application program AP to which it is connected, and in order for the program other than this application program AP to utilize this speech recognition system SRS, there arises a need to change the connection of the speech recognition system SRS to that with respect to the program other than this application program AP, which is quite time consuming.
In addition, the data exchanged between the speech recognition system SRS and the application program AP are limited to the recognition results transmitted from the speech recognition system SRS to the application program AP, so that the speech recognition system SRS cannot know the internal state of the application program AP. As a result, it has been impossible to make an automatic speech recognition control such as the recognition vocabulary change according to the internal state of the application program, and it has been necessary for the operator to make the recognition vocabulary change whenever the need arises, so that this speech recognition interface system has been rather tedious and inconvenient one to use.
FIG. 3 shows another conventional configuration for the speech recognition interface system, which comprises one speech recognition system SRS and one application program AP, which are connected with each other bidirectionally, such that various data such as the recognition vocabulary and the recognition results can be exchanged from one to the other in both directions. In such a configuration, the speech recognition system SRS can know the internal state of the application program AP, so that it can make the automatic speech recognition control such as the recognition vocabulary change. However, in this configuration, the speech recognition system SRS is exclusively used only by the application program AP with which it is connected, so that it has been impossible for the other application programs to utilize this speech recognition system SRS at the same time.
FIG. 4 shows another conventional configuration for the speech recognition interface system disclosed by Schmandt et al. in "Augmenting a Window System with Speech Input", COMPUTER, Vol. 23, pp. 50-58, August 1990, which comprises one speech recognition system SRS and a plurality of application programs AP, in which the recognition results are selectively transmitted to one of the application programs AP from the speech recognition system SRS. In this system, the speech input is achieved by translating the speech recognition result into the input from the keyboard or mouse, by utilizing the window system. In such a configuration, a plurality of application programs AP can utilize the same speech recognition system SRS at the same time, but the speech recognition system SRS cannot know the internal state of each application program AP, so that it is impossible to make the automatic speech recognition control according to the internal state of the application programs.
FIG. 5 shows another conventional configuration for the speech recognition interface system disclosed by Rudnicky et al. in "Spoken language recognition in an office management domain", Proc. ICASSP '91, S12.12, pp. 829-832, 1991, which comprises one speech recognition system SRS and a plurality of application programs AP, program AP, where the speech recognition system SRS further comprises a task manager TM connected with each of the application program AP bidirectionally, and a speech recognition unit SR connected with the task manager TM, such that various data such as the recognition vocabulary and the recognition results can be exchanged among the speech recognition system SRS and the application programs AP in both directions. This system has a feature that the continuous speech recognition function provided by the speech recognition system SRS can be shared by a plurality of application programs AP, so that it can be considered as an efficient manner of utilizing an expensive speech recognition device. However, this reference does not provide sufficient considerations for aspects regarding a real time processing and a manner of its utilization suitable for the workstations.
Also, in such a configuration, a plurality of application programs AP can share the same speech recognition system SRS and it is also possible to make the automatic speech recognition control on the speech recognition system SRS side according to the internal state of each application program AP, but this system only accounts for a case of connecting only one of the application programs AP with the speech recognition system SRS at one time, so that it has been impossible to achieve the simultaneous handling of a plurality of programs AP, by taking the full advantages of characteristic of the speech input. Also, in this system, the decision concerning the selection of the application program AP to transmit the obtained recognition result is made at the speech recognition system SRS side, so that the recognition result may not necessarily be obtained at the application program AP side at the desired timing.
Thus, the conventional speech recognition interface systems have been associated with following practical problems.
(1) As the application program AP cannot manage the speech recognition target itself, the application program AP cannot take the initiative in the speech input control, so that there are cases in which, even when the application program AP would like to urge the user to make the speech input, the application program AP must wait until the speech input permission is received from the speech recognition system SRS.
(2) A plurality of application programs AP cannot be controlled simultaneously by one speech input, so that it has been impossible to realize a highly convenient operation mode of finishing a plurality of application programs AP altogether simultaneously by a single speech input of "Finish" alone, for example.
(3) It has been impossible to distribute the speech inputs to a plurality of application programs AP according to the recognition results obtained therefrom, so that there has been a need to specify the input target before the input of the speech.
(4) As only one speech recognition system SRS has been operated with respect to one speech input, so that it has been impossible to simultaneously utilize different types of the speech recognition schemes such as the isolated word speech recognition scheme and the continuous speech recognition scheme.