1. Field of the Invention
The present invention generally relates to a spoken dialog system capable of performing automated answering operation in voice or speech by recognizing a speech of a speaker. More specifically, the present invention is directed to a voice automated answering apparatus capable of providing voice (speech) services such as information provisions and reservation business over the telephone to users.
2. Description of the Related Art
As man-to-machine interface techniques introduced in information systems, needs for voice (speech) interactive techniques have been more and more increased. The voice interactive techniques may realize automated answering systems capable of performing interactive operations between users and automated answering systems by way of voices. As an application system of this man-to-machine interface technique, for instance, telephone voice automated answering apparatuses are known by which various information required by users may be provided, and also various sorts of services may be carried out on behalf of operators. While these telephone voices automated answering apparatuses are popularized, 24-hour services are available, business efficiencies are increased, and man power may be reduced.
Generally speaking, such a spoken dialog system is typically arranged by a voice recognizing unit for recognizing a speech of a user, a dialog managing unit for managing an interactive operation executed between the user and this spoken dialog system, and a voice synthesizing unit for notifying an answer made by this spoken dialog system in the form of voice. Also, a vocabulary to be recognized by the voice recognizing unit is set as a recognized dictionary.
In this system, the speech (voice) recognition precision of the voice recognizing unit has a close relationship with a scale of a vocabulary to be recognized. The larger the scale of the vocabulary becomes, the higher the recognition difficulty is increased. As a consequence, when all of the words which may be predicted to be produced by the user are set as a recognized dictionary, large numbers of word recognition error will occur, and the total number of recognition operations about confirmed results to the user is increased, so that the interactive operation is carried out in a very low efficiency. Furthermore, the interactive operation between the user and the spoken dialog system cannot be continued and will be destroyed, so that the goal of the dialog of the user cannot be achieved.
As a consequence, in the conventional spoken dialog system, in order to maintain recognition precision at high degrees while executing an interactive operation, a recognized vocabulary is changed based on the context of the interactive operation, and recognized dictionaries are replaced. Thus, the recognition operation for the next speech made by the user may be prepared.
The method for changing a recognized vocabulary used under a certain interactive condition into another vocabulary which may be produced in a next speech by a user may be mainly classified into following two methods in accordance with ways how the interactive operation proceeds between the user and the spoken dialog system.
As one interactive method, there is a system initiative type interactive operation in which interactive operations proceed in such a manner which a system mainly inquires of a user and the user answers this inquiry. In this case, the system may determine the flow of the interactive operation, and the recognized vocabulary with respect to the next speech by the user is basically set with respect to each of interactive conditions when the interactive procedure is designed.
As another interactive method, there is a user initiative type interactive operation in which interactive operations proceed in such a manner that a user mainly inquires of a system and the system answers this inquiry. In this case, since free inquires are performed by the user, it is practically difficult to determine the flow of the interactive operation at the system designing stage. The recognized vocabulary with respect to the next speech by the user may be basically predicted in the dynamic manner from the context of the interactive operation, which corresponds to the histories such as the inquired content of the user and the system answer.
The prior art (will be referred to as "first prior art") related to the method for changing the recognized vocabulary in the above-explained system initiative type interactive operation is described by Japanese Laid-open Patent Application No. Hei-9-114493, for example, entitled "Interactive Control Apparatus" as shown in FIG. 21.
In FIG. 21, reference numeral 11 indicates a topic determining unit for determining a topic, and reference numeral 12 shows a recognized word predicting unit. This recognized word predicting unit 12 is equipped with a recognized word dictionary 121, a resemble word table 122, a word focusing table 123, a resemble word retrieving unit 124, and a focused word retrieving unit 135. The resemble word retrieving unit 124 retrieves resemble words contained in the confirmed word dictionary 121 by referring to the resemble word table 122. The focused word retrieving unit 125 checks as to whether or not a recognized word contained in a recognized word dictionary 121 is erroneously recognized with reference to the word focusing table 123 so as to retrieve such a recognized word which owns no history of erroneous recognition. Also, reference numeral 13 indicates a voice output sentence producing unit, reference numeral 14 shows a recognition control unit, and reference numeral 15 denotes a correct/incorrect judging unit.
In accordance with the first prior art shown in FIG. 21, the interactive operation smoothing technique is disclosed as an interactive example of the meeting room reservation service. That is, the total number of speech reissuing actions in the case that the erroneous confirmation happens to occur is reduced, and the interactive operation can be therefore carried out in a smooth way. As a concrete interactive operation, the system makes such an inquiry "KAIGI SHITSU MEI O DOZO (please let me know the name of the meeting room)" to the user. Then, the user speaks "KONA "A" DESU (It is corner "A")." As a result of speech recognition, when the system erroneously recognizes "KONA "B" DESU (it is corner "B")," the system confirms to the user, asking "KONA "B" DESU KA? (corner "B" is correct?)," and the user answers "IIE (No)."
In such a context of the interactive operation, the system does not urge the user to reenter the voice "MO ICHIDO OSSHATTE KUDASAI (please say it again)," but stores in advance such a resemble word into the resemble word table 122. This resemble word may be mistakenly recognized as the word "KONA "B" (corner "B")."
Then, for example, in such a case that the word "KONA "A" (corner "A")" resides as the resemble word "KONA "B" (corner "B")" in the lower-graded candidates of the recognized result, the system confirms, asking "KONA "A" DESU KA? (corner "A" is correct?)." As a result, the total number of speech reissuing operations when the erroneous recognition occurs can be reduced, and the next recognized word candidate can be quickly specified.
There are two sets of the below-mentioned methods for changing a recognized vocabulary.
First, when a system is designed, as a preset vocabulary, the following topics such as a name of a reserving person, a date, time when a meeting room is initiated for use, time when a meeting room is terminated for use, and a name of a meeting room are determined which are items required for reserving a meeting room. Also, this system is equipped with the recognized word dictionary 121 stored with a plurality of recognized words every subject so as to select such a recognized word corresponding to the candidate determined by the candidate determining unit 11.
Furthermore, in the case that the system makes an erroneous recognition, a history of the word which is denied since a user says "NO" is stored in the word focused table 123. Then, this system eliminates this denied word from the subsequent recognized vocabularies of the user. The foregoing description is related to the first prior art.
Next, a conventional method for changing a recognized vocabulary in the user initiative type interactive operation (will be referred to a "second prior art"), for instance, is described in Japanese Laid-open Patent Application No. 6-208389, entitled "Information Processing Method and Apparatus" as shown in FIG. 22.
In FIG. 22, reference numeral 301 indicates an interactive answer producing unit, reference numeral 302 shows a dialog managing unit, reference numeral 303 represents an information retrieving unit for retrieving a database, reference numeral 312 is a voice recognizing unit, and reference numeral 320 shows a recognized subject producing unit. This recognized subject producing unit 320 is equipped with a stationary word dictionary unit 304, a stationary grammar unit 305, a next speech predicting word dictionary producing unit 306, a retrieve content word dictionary producing unit 307, a word reading unit 308, a dynamic grammar unit 309, a dynamic grammar selecting unit 310, and a recognized subject sentence producing unit 311.
That is, while the word dictionaries and grammars for accepting an entry by means of speech during any interactive operation so as to recognize a voice are produced by the stationary word dictionary unit 34 and the stationary grammar unit 305, the word dictionaries for accepting an entry which is dynamically changed as an interactive operation proceeds so as to recognize a voice are produced by the next speech word dictionary producing unit 306 and the retrieve content word dictionary producing unit 307. Also, the grammar is produced in the recognized subject sentence producing unit 311 by employing the dynamic grammar selecting unit 310 rather than the dynamic grammar unit 309 in response to a content of a produced word dictionary.
In the second prior art shown in FIG. 22, the interactive operation about the travel information retrieving operation is carried out by way of example, and the entry by the user during the interactive operation is classified into two sorts of travel information retrieving operations.
That is to say, such unfocused questions as "what item is available?", and also such very global questions (namely, very loose question) as "golf playgrounds located in Tokyo, please" may be made at any time when a user performs an interactive operation. As a result, these questions are handled as a stationary vocabulary in a recognized vocabulary.
Also, other questions related to detailed contents such as "a telephone number of YUMOTO ONSEN hot spa at HAKONE, please" and "what is an address of the hot spa located at YOSHII-MACHI of GUNMA?" are dynamically changed as the interactive operation proceeds. As a result, these questions are handled as a dynamic vocabulary in a recognized vocabulary.
As previously explained, the recognized vocabulary is divided into the stationary vocabulary and the dynamic vocabulary. The word contained in the next speech is predicted based upon the retrieved result and the content of the interactive operation so that the dynamic vocabulary is changed.
As previously explained, in accordance with both the first prior art and the second prior art, the recognized vocabulary is changed based upon the context of the interactive operation, and the recognized dictionaries are replaced so as to accept the recognition of the next speech issued by the user. However, the following case is conceivable. That is, a user who tries to execute an interactive operation with the system will make a speech related to a vocabulary in a recognized dictionary of the next speech, and at the same time, will make a voice related to an item not to be recognized.
For example, in the meeting room reservation in the first prior art, when the system inquires the user of "KAIGISHITSU MEI O DOSO (please enter the name of the meeting room)", some user may answer "KONA "A" O 10-JI KARA (corner "A" from 10 a.m.), please". At this time, since that "KONA "A" (corner "A")" is correctly recognized by the system, this system will make such a question to the user, "NANJI KARA OTSUKAI DESUKA (what time will you use)?" related to the starting time of using the meeting room which is the not-yet-acquired item.
In such a context or the interactive operation, since the starting time of using the meeting room is again inquired by the system, the user must again enter the same item, so that the interactive operation is carried out in a very low efficiency.
Furthermore, it is unnatural for a user to be once again inquired of the item that the user believes has been entered. This may cause the user to be confused. In a response, the user says, for instance, "E? SAKKI IIMASHITA KEDO (What? I've already said it)." The system fails to confirm this voice, because this voice is out of the recognized vocabulary, and answers "MO ICHIDO OSSHATE KUDASAI (please say it again)". Subsequently, the user again says "SAKKI IIMASHITA KEDO (I've already said it)", while the user cannot understand that this vocabulary is out of the recognized vocabulary. As a result, this condition may cause such a problem that such an interactive operation will be repeatedly performed, and the goal of the dialog cannot be completed.
Similarly, in the second prior art, in the speech containing words out of the predicted vocabulary for the next speech, the words out of the vocabulary cannot be accepted. As a consequence, there is another problem that the information contained in these words cannot be effectively utilized in the interactive operation.