in the recent years, smart devices are more widely used in the world, people are also more demanding on the intelligence of their smart devices. Under such condition, the speech recognition system emerged as a requirement of many systems. This technology is,progressing but is still not as intelligent as people require. Most of the existing technology cannot he referred to as smart because they lack accuracy and cannot understand the user's intention very well. Some systems are highly accurate but will employ a large amount of resources for running the algorithm.
Accordingly, it is desirable to provide an improved method for the understanding of the user's intention, and especially manage speech dialog to improve the accuracy of the searched content and matched result, as well as the response operation.
Dialog Management Systems aim to provide ability to design, observe and develop verbal dialogs between human users and computer systems. In general, creating natural dialogs are the target of these systems. Topology generally comprises of a speech recognizer, an AI or intent manager and/or a slot parser, and an announce module (which can be a TTS or record player).
After an utterance is recognized via recognizer the first step should be determining user's intent. Depending on specified intent, intended operation and its parameters differ. In order to determine true intent, many applications use statistical matching algorithms. Also, in “Apparatus and method for spoken language understanding by using semantic role labeling”, semantic role labeling like part of speech tagging is suggested. Although this method is a solution for intent matching, it is difficult to implement for different applications and languages. In addition, automatic part of speech tagging systems have a margin of error and this may lead to inaccuracies in intent determination.
Most of the previous systems assume that they have prior information about the context when determining the intent of the user. However in a natural dialog system users may often move out of context and switch their intent. Therefore, this leads to inaccuracies in the dialog system responses.
In most applications dynamic field slots are filled with information which matches an item in a preconditioned list. For example, in “Adaptive knowledge base of complex information through interactive voice dialog” pre-determined frames contain pre-determined vocabulary with slot filling intention. This structure offers filling the slots after revealing intention which means utterances comprising related/important information may be missed as a result of undetermined intention.
On the other hand, this situation may create loss of information about true intent. Utterance may contain such words carrying information about the cut; on the other hand they should be also taken as dynamic fields. In this case, the solution mentioned above and the one explained in “Trainable sentence planning system” lack of an algorithm which is able to process intent matching and form filling processes simultaneously.
Most current IVR based call routing systems are based on voice XML grammar design. However, designing a voice XML grammar requires special expertise and it is very difficult for them to handle out of grammar words and garbage words. For each new implementation this grammar has to be built from scratch and leads to significant effort consumption.
In a natural dialog, correction of information or change of decision is smoothly done by replacing the dynamic slots in an utterance. In response listener accepts new info as a correction. For instance, if the utterance “I want to buy two cones of ice-cream with no-topping” is followed with “make it chocolate sauce toppings”, the slot “no-topping” replaced by “chocolate sauce topping”. On the other hand, in some cases instead of replacing the content, it should be expanded. If the utterance “I want to buy two cones of ice-cream with no-topping” is followed with “and a slice of pie”, that means both desserts are requested by the user. Most current dialog systems have significant difficulty in handling these situations.
In current dialog systems the prompts that are read by the system to the users are managed by the client application. Therefore the intents are designed in the dialog system and prompts are designed in the client application. This requires management of both systems at the same time and requires significant coding or configuration on the client application.
The responses of the dialog system in current solutions are static in general. The system always responds with the same information or question for the same user intent. This leads to a mechanical dialog experience for the user.
Traditional dialog systems do not take into account the demographics of the user profile or past dialog history in order to determine the speaking style or persona of the dialog manager. This leads to an unnatural dialog. For example the person informally says “What's up” and the system responds with a serious answer like “I am very fine, thank you”.