This invention relates to configuration of a spoken language understanding system, and more particularly, to determination of parameters for configuring a classifier based on partially annotated linguistic data.
A language understanding system may be designed to enable a user to provide a linguistic input, for example, in the form of text or as a spoken utterance, and for the system to determine the user's intent or to determine values of semantically meaningful items in the input. Some systems can respond to a wide range of user intents and these intents may be grouped. For example, intents related to interacting with a messaging system (e.g., Twitter) may be grouped. Each such group may be referred to as a “skill.” For example, “post a tweet” may be an intent in the “Twitter” skill. Often, a system may be configured to support a large number of skills, and each skill may have anywhere from one to hundreds or more different intents.
It is desirable to have the system improve its accuracy in the task of determining the skill and intent from user's inputs. One way to do this is to collect input data during operational use of the system, and then manually (i.e., using a human reviewer) annotate the inputs with the correct skill and intent for each input. Once a sufficient amount of such annotated data is prepared, the configuration of the system may be updated to better match the annotated input, and thereby hopefully provide improved accuracy for further user input utterances.
However, the task of manually annotating sufficient amounts of collected input may require a prohibitive amount of effort (i.e., person-hours of annotation time), and therefore may not be feasible. Nevertheless, it is important to provide a way to improve accuracy, for example, during an initial period after a skill is first introduced when relatively little data may be available from which to configure the system.