Speech language understanding refers to analyzing utterances subjected to speech recognition, and extracting slots according to a semantic structure. The speech language understanding plays a key role in various natural language processing systems, such as a dialog system.
In general, speech language understanding for a dialog system uses a semantic structure based on three factors of a dialog act, a main act and a named entity.
A dialog act represents an intention of an utterance independent of a domain, and is tagged based on a sentence represented in the utterance. A main act represents an intention of an utterance dependent on a domain, and is tagged based on a function. A named entity represents a word having a meaning that is needed to perform a function. For each sentence, one dialog act and one main act exist, and the named entity, which is applied to each word, may be none or at least one.
FIG. 1 illustrates an example of a semantic structure extracted from a spoken sentence in speech language understanding.
Referring to FIG. 1, for an input sentence (‘let me know phone number of auditorium’), a dialog act is ‘request’, a main act is ‘search_phone’, and a named entity is ‘auditorium.’
However, in order to develop the speech language understanding, labels for the dialog act, the main act and the named entity need to be designed. The label designing needs to be repeated for domains in which speech language understanding is performed. In addition, in order to generate a model for speech language understanding, a collected corpus needs to be tagged by referring to the label designing. The tagging process requires a great amount of expense.
FIG. 2 is a flowchart showing a process of tagging a corpus for speech language understanding.
Referring to FIG. 2, developing the conventional language understanding technology includes defining a domain to which the language understanding is applied (S210), collecting a corpus according to the defined domain (S220), designing labels for a dialog act/a main act/a named entity (S230) and performing a tagging task (S240).
The tagged corpus is learned by a trainer (S250) and a model 200 is created from the trainer. The created model 200 is used in a language understanding module.
The processes of label designing and tagging require the labor of people, and it is referred to as a manual tagging task.