1. Field of the Invention
The present invention relates to a method and a system for generating dialogue managers with diversified dialogue acts.
2. Description of Related Art
As the spoken dialogue system has been widely used, a dialogue manager (DM) in a dialogue system is getting increasingly complicated. In the design of a dialogue system, the DM plays a role of associating technique with design. Therefore, besides determining appropriate system responses according to the analyzed user speech data, the impression of the user in response to the system is also considered. As such, most of the current dialogue systems are designed manually. Though the manual design ensures the accuracy of the dialogue system, the design cost is rather high, especially when a complicated dialogue system is under design. Moreover, when the total number of the DM rules is increased, it is hard to maintain the consistency of the whole system.
A typical example is given below. According to the designing experience, the system designer puts forward a DM with 19 states. However, after analyzing a great number of dialogue logs, the designer finds that four user defined states are not used at all. Meanwhile, the system act also appears to be uneven and favours a particular act. If there are relatively few states or DM rules, the system designer may easily check the rules to avoid the problem. However, if the number of rules exceeds a certain extent, it is quite difficult to rapidly find out the appropriate rule to make improvements. Moreover, if one rule is modified, other rules may be affected, which may cause unexpected impacts on the system act.
In the development of dialogue system, it has become a common idea to facilitate the design of the dialogue system through user simulation in the applications of the dialogue system. By using the user simulation, the system designer gets to know responses of the dialogue system on certain dialogues, and such information can be used to further improve the dialogue system.
Through the user simulation, simulation data is generated before the system is delivered to customer, which enables the system designer to adjust the dialogue system act; however, this process also needs a lot of labours. Afterwards, the designed acts substantially meet the requirements of the user, i.e., to achieve the final purpose of a dialogue (for example, ticket booking or information query). However, the designed DM system has a fixed act mode.
A fixed dialogue act mode has met the basic requirements in the design of a conventional dialogue system. However, with the expansion of the applications of the dialogue system, many applications require more diversified and varied dialogue systems. Taking the application of a dialogue system in the language learning as an example, if the system act is always fixed each time when the user interacts with the system, the motivation for a user/learner to use the system is lowered. On the contrary, if the system act is diversified, even though the content of the textbook is fixed, the diversified system act may also enhance the learning motivation of the learner. Therefore, as for dialogue systems of next generation, it has become an important issue in the design of a dialogue system about how to effectively accelerate the dialogue system for generating diversified dialogue acts.
In U.S. Pat. No. 5,694,558, entitled “Method and System for Interactive Object-oriented Dialogue Management”, an interactive object-oriented DM system is provided, in which a state-based DM is used to divide the whole content of a dialogue into several sub-dialogues (i.e., several different states) according to the topics or types, and each sub-dialogue has the respective dialogue content and dialogue flow. The DM is operated to determine whether to transit to other states or not according to the circumstance of the current dialogue. Each state (i.e., each sub-dialogue) can be represented by an object.
In U.S. Pat. No. 7,167,832, entitled “Method for Dialog Management”, a DM system is provided, in which the flow architecture of the DM focuses on the design of motivator. The DM disclosed by the patent includes a plurality of motivators, and the dialogue content in a dialogue system is processed according to the motivators. The DM of the patent at least includes two motivators: assumption and confirmation.
The above patents both emphasize the content architecture of the DM, without mentioning the method required for designing a DM. Furthermore, U.S. Pat. No. 7,024,348, entitled “Dialogue Flow Interpreter Development Tool”, provides a dialogue flow development tool used in a dialogue system, so as to generate a data file through a particular control language. The data file contains prompts, responses, branches, and dialogue flows required in a speech system. Through special processing, the data file can automatically generate speech applications, so as to save the cost for developing the whole speech dialogue system. However, in this patent, it is clearly stated that the speech interaction between the user and the system must be designed through the flow aid design in the system design.
In relevant publications and papers, the conventional methods for designing a DM generally include designing through dialog grammar, plan-based DM, and collaborative DM. Different methods have different characteristics, and are applicable for different fields. Moreover, in recent years, it is quite popular to combine the methods with each other in practice.
In two papers, “Plain-Speaking: a Theory and Grammar of Spontaneous Discourse”, issued by Reichman in 1981, PhD thesis, Department of Computer Science, Harvard University, Cambridge, Mass. and “A Syntactic Approach to Discourse Semantics” issued by Polany and Scha in 1984, published in Proceedings of the 10th International Conference on Computational Linguistics, Stanford University, California, ACL, 1984, a DM based on dialog grammar is provided. However, this method requires compiling plenty of rules to describe how a dialogue is made. As a result, though the above method appears in early days and has been most widely used, its portability is not high due to the compiling of rules.
Furthermore, in “Automatic Acquisition of Probabilistic Dialogue Models” issued by Kita et al., published in Proceedings of ICSLP'96, pp. 196-199, Philadelphia, 1996 and “Using Markov Decision Process for Learning Dialogue Strategies” issued by Levin E. et al., published in Proceedings of ICASSP'98, pp. 201-203, Seattle, 1998, the dialogue rules are further expressed into finite state network (FSN). The content of a dialogue is divided into different states, and the DM is performed through transiting among different states. In addition, it may also possibly integrate one state with another to alter the weight of a dialogue path.
The plan-based DM not only considers the content of each sentence in words, but also considers actions involved in communication (for example, confirmation and query), and furthermore, people plan certain actions to be used for achieving the purpose of communications. For example, “Analyzing Intentions in Dialogues”, issued by J. F. Allen and C. R. Perault et al., published in Artificial Intelligence, 15(3):143-178, 1980 and “Intentions in Communication”, issued by P. R. Cohen, J. Morgan, and M. E. Pollack et al., published in MIT Press, Cambridge, Mass. both mention relevant techniques.
In addition, the collaborative DM considers the dialogue process as a collaborative process. This method mainly captures the intentions of both parties in a dialogue, confirms the intention of each party through several rounds, then continues the dialogue after establishing a common basis, and finally accomplishes the purpose of the dialogue. Relevant techniques are mentioned in, for example, “Conversational Agency: The TRAINS-93 Dialogue management”, issued by D. R. Traum in Luperfoy et al. 1996, “Beliefs, Stereotypes and Dynamic Agent Modeling”, issued by Y Wilks and A. Ballim et al., or the publication “User Modeling and User-Adapted Interaction”, Vol. 1, No. 1, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1991.
Moreover, in an article “Spoken Dialog Technology: Enabling the Conversational User Interface”, issued by M. F. McTear, published in ACM Computing Surveys, vol. 34, pp. 90-169, March 2002, the DM is further classified into three types:    (i) System-initiative: the DM is defined by finite states and is achieved through states transitions, and this type of DM is suitable for a relatively narrow application field with relatively fixed dialogue content.    (ii) User-initiative: the user intentions are captured by a frame-based mode, and this type of DM has flexible dialogue content, and the user may express his intentions freely, but the dialogue process thereof is difficult to be handled.    (iii) Mixed-initiative: it is formed by mixing the system-initiative with the user-initiative together, so that the system can fulfil a natural dialogue within certain restrictions.
The above methods cannot be strictly ranked as good or bad, but merely differ from each other in specific properties as well as applicable circumstances.
In addition to the methods commonly used in the past decades, some scholars recently have proposed to make the dialogue system learn relevant responses through the interactions with the user. In such a method, generally, the user designs the state of the dialogue field and relevant objective functions, and employs the reinforcement learning. For example, in articles such as “Using Markov Decision Processes for Learning Dialogue Strategies”, issued by E. Levin, R. Pieraccini, and W. Eckert et al., published in Proceedings of the IEEE Transactions on Speech and Audio Processing, 1998, vol. 8, pp. 11-23, or “Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJ-fun System”, issued by S. Singh, D. Litman, M. Kearns, and M. Walker et al., published in Journal of Artificial Intelligence Research, vol. 16, pp. 105-133, 2002, the dialogue system is made to learn the weight of transitions from state to state through objective functions. Through such method, the weight can be obtained by automatic training, but the designer must define transitions among states before hand. However, through designing by this method, the obtained DM is a fixed DM, and cannot be trained to generate diversified variations with the same dialogue purpose.