Spoken Language Understanding (SLU) is an existing technology that enables a variety of applications, mostly in the field of telephony customer care automation. Call routing, among others, is one of the examples that addresses users who contact a telephone call center and need to be routed to the appropriate agent to receive a particular service. Callers are greeted with an open prompt that invites them to speak the reason for their call in their own words. SLU technology is used for performing an automatic mapping of the spoken utterance to one of a number of possible call-reasons. SLU is also used in more complex applications, such as technical support, where users call to solve a problem they are experiencing with a service or an appliance. Automated technical support systems need first to identify the problem, or the symptom of the problem the caller is experiencing. In this context, SLU is used to map callers' utterances to symptoms in order to successfully proceed to the correct resolution of the problem.
In general SLU performs a mapping between input utterances and a number of categories defined a priori. State-of-the-art technology allows performing the mapping either by handwritten rules represented in the form of grammars or, most effectively, by using statistical machine learning algorithms. With the latter, a set of training utterances is collected, transcribed, and annotated with the true category to which they belong. Machine learning algorithms allow training an automatic classifier on the corpus of annotated utterances. The classifier can then be used to assign any input utterance to one of the pre-defined categories which have been used to annotate the training utterances.
The issue with certain applications of SLU is that the defined categories may have different levels of resolution, or specificity, that are adequate for routing a call to be handled by a live agent but insufficient for calls that need to be handled by a machine. The following are examples of callers' utterances in the domain of technical support for cable TV service in response to a request such as “Please say the reason you are calling” played immediately after callers reach an automated technical support number:                1. I have a technical problem with the cable TV service.        2. I am experiencing a problem with ordering.        3. I would like to order a movie on demand, but I have a problem.        4. I get an error code on the TV while trying to order a movie on demand.        5. When I order a movie on demand it asks for a PIN, which I don't have.        
The first utterance is vague, and one cannot tell which type of problem the caller is experiencing—it could be a problem with picture quality, movies on demand, pay-per-view, or any other of the many problems one can experience with cable TV service. The utterance in the second example is more specific. In fact the caller states clearly that he has a problem with ordering a show or a movie, but it is not clear which type of problem. Similarly it is not clear whether he has a problem with ordering an on-demand or pay-per-view event, which are two different ways to order movies or shows on cable TV. The third example is more specific, and states that the problem is related to on-demand ordering, but does not say which type of problem it is. Finally the other examples are more specific and completely describe the problem.
It is clear from the above examples that a well designed voice user interface should follow up with a different dialog for each one of the above utterances. Existing SLU systems use a number of well defined categories, which do not present variable levels of specificity, and thus do not allow treatment of vague utterances in a different way than more specific ones.
Another drawback of the existing SLU technology is that it does not take advantage of the intrinsic hierarchy of categories in order to control the classification error. In other words errors made at different levels of the hierarchy may have different impact with respect to the final outcome of the interaction. For instance, a substitution error between a specific category and a less specific one in the same hierarchy is less severe than an error between categories at the same level of specificity or errors between categories in different hierarchical paths.