1. Field of the Disclosure
The present disclosure relates generally to a spoken dialogue interface and, more particularly, to a spoken dialogue interface apparatus and method, which, in a spoken dialogue system, provides a dialogue model capable of processing various dialogue states using the advantages of a conventional flame-based model and a conventional plan-based model, and can actively react to the mobility between domains and the expandability of service using the dialogue model.
2. Description of the Related Art
FIG. 1 is a block diagram showing the operation of a conventional spoken dialogue interface apparatus. First, the conventional spoken dialogue interface apparatus performs speech recognition on a user's speech (110). Then, the conventional spoken dialogue interface apparatus interprets the language spoken by the user by analyzing the recognized speech (120), and then performs dialogue processing using the interpreted language (130).
For example, when the interpreted language is a control command that controls a specific device, the conventional spoken dialogue interface apparatus performs an operation of controlling the corresponding device (150). Such an operation is referred to as “service performance” below.
The service performance may include the performance of information retrieval in response to a request made through a user's speech, besides the control of the specific device. That is, “service performance” refers to the performance of a specified operation that a user requests through speech.
Meanwhile, the conventional spoken dialogue interface apparatus can perform plan management 140 in the performance of the dialogue processing. The plan management 140 refers to the management and planning of a series of detailed operations required for the performance of a specific service. That is, when performing the dialogue processing 130, the conventional spoken dialogue interface apparatus sequentially performs services suitable for situations based on the plan management 140.
When the conventional spoken dialogue interface apparatus cannot interpret the language of the user's speech or receives the results of service performance, the conventional spoken dialogue interface apparatus must inform the user of it.
Therefore, the conventional spoken dialogue interface apparatus generates a language to be used to respond to the user based on a specific spoken dialogue model, and informs the user of the generated language through a predetermined display (180), or through a speech synthesis process 170 of transforming the responding language into speech and a speaker. The method shown in FIG. 1 is generally used in a spoken dialogue interface apparatus which recognizes a user's speech, interprets the recognized speech, performs a specific function and then verbally informs the user of performance results. Various spoken dialogue models have been proposed for the dialogue processing 130 about how to process the interpreted language. FIGS. 2 to 5 are diagrams illustrating examples of the four representatives of the various spoken dialogue models.
FIG. 2 is a diagram illustrating an example of a conventional spoken dialogue model using pattern matching.
First, a plurality of keywords is extracted from a user's speech (210), and a pattern matching operation is performed using a list of extracted keywords and pattern information stored in a dialogue script DataBase (DB) (220). If a matching pattern exists, a corresponding dialogue script is selected and a response is generated using a template in the selected dialogue script (240). Then the spoken dialogue interface apparatus transfers the generated response to the user.
The spoken dialogue model using the pattern matching method is disclosed in U.S. Pat. No. 6,604,090.
FIG. 3 is a diagram illustrating an example of a conventional spoken dialogue model using a finite state model.
In the finite state model, for each state, the spoken dialogue interface apparatus queries the user and interprets the user's response to the query. At this time, each state knows the histories of previous states. For example, in state-4 shown in FIG. 3, a dialogue with the user is performed while the results of state-1 and state-2 remain known.
The dialogue in the finite state model is mainly led by the spoken dialogue interface apparatus, an example of which is an Automatic Response System (ARS).
The spoken dialogue model based on the finite state model is disclosed in U.S. Pat. No. 6,356,869.
FIG. 4 is a drawing illustrating an example of a conventional spoken dialogue model using a frame-based model.
The frame-based model conducts a spoken dialogue based on a table type frame 410.
The frame 410 includes parameter fields 410 which are required in order for a user's language to be recognized by the spoken dialogue interface apparatus, and a response field 420 for which content to be used to respond to the user is set depending on values set in the parameter fields 410.
For example, in FIG. 4, the frame structure of the frame-based spoken dialogue interface apparatus for a flight reservation is shown.
The parameter fields 410 include a departure location field, a departure time field, a destination field, a flight Number field, and a current reservation status field. For example, when recognizing only information about a departure location and departure time from a user's speech, the spoken dialogue interface apparatus queries the user about the destination in response to the speech. Alternatively, when recognizing information about a departure location, departure time and a destination from the user's speech, the spoken dialogue interface apparatus informs the user of a corresponding flight number and reservation status by searching a DB for flight reservation status.
The spoken dialogue model based on the frame-based model is disclosed in U.S. Pat. No. 6,044,347.
FIG. 5 is a drawing illustrating an example of a conventional spoken dialogue model using a plan-based model.
The plan-based model uses a hierarchical tree structure. In this hierarchical tree structure, the ultimate purpose of a user is located in the uppermost layer and elements necessary in order to accomplish the purpose are located in lower layers.
In FIG. 5, an example of the tree structure for a train trip is shown. For example, information about a selected train, ticket purchase, boarding time and gate is located in the lower layers. When a user requests service for a train trip, the plan-based spoken dialogue interface apparatus responds to the user request based on the tree structure shown in FIG. 5.
The spoken dialogue model based on the plan-based model is disclosed in U.S. Pat. No. 6,786,651.
Of the spoken dialogue models for the above-described spoken dialogue interface, the spoken dialogue model using the pattern matching shown in FIG. 2 enables dialogue knowledge to be easily established, but has a problem in that it cannot easily process various dialogues because it performs only simple pattern matching. Furthermore, the plan-based spoken dialogue model shown in FIG. 5 can process various dialogues, but has a problem in that vast dialogue knowledge for the various dialogues must be established. Furthermore, there is another problem in that the maintenance of the established dialogue knowledge is not easy.
As a result, a spoken dialogue model that can process various dialogues and allows knowledge to be easily established is required.