1. Field of the Invention
The present invention relates to speech processing and more specifically to using semantic role labeling for spoken language understanding.
2. Introduction
Spoken language understanding aims to extract the meaning of the speech utterances. In the last decade, a variety of practical goal-oriented spoken dialog systems (SDS) have been built for limited domains. These systems aim to identify intents of humans, expressed in natural language, and take actions accordingly, to satisfy their request. In such systems, typically, first the speaker's utterance is recognized using an automatic speech recognizer (ASR). Then, the intent of the speaker is identified from the recognized sequence using a spoken language understanding (SLU) component. Finally, the role of a dialog manager (DM) is to interact with the user in a natural way and help the user to achieve the task that the system is designed to support.
A typical SDS example may be a machine-initiative system, in which the user answers the computer prompts using the allowed corresponding phrases, for example, “Please say hotel reservation or car reservation.” In such a directed dialog, “understanding” is reduced to the task of detecting one of the keywords allowed in the users' utterances.
A more sophisticated understanding system would allow the users to talk naturally about a single given intent. Such systems have been developed mostly in the framework of government funded projects. For example, in the early 90s, Defense Advanced Research Projects Agency (DARPA) had initiated the Airline Travel Information System (ATIS) project, in which the aim was to integrate the efforts of the speech and language processing communities. In this task, the users could utter queries regarding flight information. An example would be, “I want to fly to Boston from New York next week.” In this case, understanding is reduced to the problem of extracting task specific arguments in a given frame, such as Destination and Departure Date. Participating systems either employed a data-driven statistical approach (mostly from the speech processing community), or a knowledge-based approach (mostly from the computational linguistics community). Although both DARPA ATIS and the following Communicator projects are over, they left a test-bed for other SLU approaches and lead to similar mixed or machine-initiative commercial applications.
A more general approach would be both determining the intent of the user and extracting the corresponding arguments as in the AT&T How May I Help You?SM (HMIHY) spoken dialog system used for customer care centers. As an example, consider the utterance, “I have a question about my June bill.” Assuming that the utterance is recognized correctly, the corresponding intent (call-type) would be Ask (Bill) and the argument for this call-type, i.e., the named entity Date would be extracted as June. Then the action that needs to be taken depends on the DM. The DM may ask the user to further specify the problem or route this call to the billing department. Following the HMIHY system, a number of similar systems were built.
In all these previous works, the semantic representation of the meaning heavily depended on the corresponding task and was predefined. For example, ATIS includes flight reservation related arguments, such as arrival and departure cities. In HMIHY, it is the call-type and the corresponding arguments (named entities) designed according to the incoming call traffic. Call-type classification is used to determine the intent and named entity extraction is used to find the associated arguments. For this purpose, one can use a domain-dependent approach as in the previous works. But this approach has some serious drawbacks:                Training statistical models for intent classification and named entity extraction requires large amounts of labeled in-domain data, which is very expensive and time-consuming to prepare. If rule-based methods are used for these tasks, this requires some human expertise and has similar problems.        Preparation of the labeling guide (i.e., designing the intents and named entities) for a given spoken language understanding task involves non-trivial design decisions. For example if the user says “I wanna cancel my long distance service,” one alternative would be labeling it as the intent Cancel(Service) with a named entity Service Type with values such as long distance, international, local, etc. The other option is calling it as a single intent Cancel(LD_Service) with no associated named entity and have other intents such as Cancel(Local_Service), etc. Such decisions depend on the expert who is designing the task structure and the frequency of the intents and named entities for a given task. Furthermore, one expects the intents and named entities to be clearly defined in order to ease the job of the classifier and the human labelers.        Another issue is consistency between different tasks. This is important for manually labeling the data quickly and correctly and making the labeled data reusable across different applications. For example in most applications, utterances like “I want to talk to a human not a machine” appear and can be processed similarly.        
On the other hand, in the computational linguistics domain, task independent semantic representations have been proposed over the last few decades. Two notable studies are the FrameNet and PropBank projects. The Propbank project aims at creating a corpus of text annotated with information about basic semantic propositions. Predicate/argument relations are added to syntactic trees of the existing Penn Treebank, which is mostly grammatical written text. Very recently, the PropBank corpus had been used for semantic role labeling (SRL) at the 2004 Conference on Computational Natural Language Learning (ConLL-2004) as the shared task. SRL aims to put “who did what to whom” kind of structures to sentences without considering the application using this information. More formally, given a predicate of the sentence, the goal of SRL is to identify all its arguments and their semantic roles.
The relationship between the arguments of the predicates in a sentence and named entities have been previously exploited by Surdeanu et al. as described in “Using predicate-argument structures for information extraction,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, 1993, the contents of which are herein incorporated by reference in its entirety.