Recording dialogs is a common business practice. The dialogs may be voice (e.g., telephone) or text (e.g., chat, email, website comments, etc.) and may represent discussion between individuals (e.g., between a customer and a customer service agent) or may represent statements from a single individual (e.g., product buyer/user). For the purposes of this disclosure, the term “transcript” will be used to refer generally to any of these cases.
Transcripts of a dialog may be created in real-time as part of a recording process or in batch-mode as part of an archiving process. In some cases, a transcription may include a voice-to-text conversion and/or a speaker separation (e.g., diarization). In other cases (e.g., email, chat, etc.), a transcript is provided or recorded as-is with no conversion or separation. Transcripts are typically text documents that may include associated transcript data describing additional features associated with the dialog (e.g., time-stamp, dialog type, speaker identity, etc.).
Often it is desirable to identify the sentiment (e.g., feeling, emotion, attitude, judgement, thought, etc.) of a dialog participant from the content of a dialog transcript. For example, a transcript of an unhappy customer may trigger sending the customer a coupon to improve relations and/or ensure loyalty. In another example, transcripts with a sentiment identified as negative (e.g., negative review, irate customer call) may be used for product/process improvement and/or agent training. Thousands of dialog transcripts may be generated daily in businesses (e.g., call centers, commercial websites, etc.). Thus, a need exists for the sentiment classification to be automated.
Classifying the sentiment of a dialog transcript (e.g., from a business transaction) is challenging because people express emotion in a variety of ways. Negative emotions may be distinguished implicitly for some statements (e.g., “they should have called me”) but not for others. For example, some statements or questions (e.g., “I originally asked to cancel my account on February 26th”) may be interpreted as either a neutral (e.g., informative) statement or as a negative comment. Knowledge of the context surrounding a statement may help to distinguish its corresponding sentiment. For example, “let me get this straight” may be a simple request for clarification or may be a strongly negative expression—the choice of which depends on the context. Further, the communication environment also has a great impact on sentiment classification. For example, keywords, punctuations, and phrases that indicate a negative sentiment in an email or chat may be poor indicators of negative sentiment in a transcript of a telephone call, and vice versa. Thus, a need exists for the sentiment classification to accommodate implicit interpretations, context-dependent interpretations, and environment-dependent interpretations.
Classification is also challenging because sentiment has a subjective aspect. For example, indicators of negative sentiment may vary by culture, region, or language. Thus, a need exists for the sentiment classification to accommodate cultural, regional, business practice, or language variations.
Classification is also challenging because the difference between sentiments in a conversation may be subtle. For example, the detectable differences between a negative dialog and a neutral dialog in transcripts from a customer service center may be slight because both dialogs may contain indicators of a negative sentiment. Thus, a need exists for the sentiment classification to be sensitive to subtle indicators.
Classification is also challenging because sentiment may change within a dialog, especially in long documents. As a result, it is often necessary to characterize dialogs on a phrase-by-phrase (or even a word-by-word) basis. Thus, a need exists for the sentiment classification to accommodate classification at the various levels of resolution (e.g., dialog, sentence, utterance, phrase, word, etc.).
One approach to classification uses a single general-purpose lexicon that contains the words, phrases and characteristics (e.g., punctuation, utterance length, etc.) that indicate the sentiment of a dialog. This single lexicon approach is not practical, however, because it is difficult to include all possible indicators (i.e., large breadth requirement) while also maintaining the classification sensitivity to detect subtle indicators (i.e., large depth requirement) in a reasonably fast classification process. In addition, adapting the general-purpose lexicon to a particular environment, culture, or language is complicated and cumbersome.
Another approach to classification uses supervised machine learning to train a lexicon for classification. This approach offers more flexibility than the single lexicon approach because a lexicon may be trained to accommodate a particular type of dialog from a particular type of environment. For example, classifying the sentiment (e.g., of product or movie reviews) has been proposed using a model trained with tagged (i.e., annotated) training data. In other words, the machine learning is supervised, and requires a training set of previously classified product reviews. This approach is not efficient because it requires human participation in the creation of the tagged training data.
A need, therefore, exists for a system and method for classifying (e.g., distinguishing, identifying, tagging, etc.) the sentiment of a document (e.g., dialog transcript) that is sensitive and that may be adapted to various environments using an automatic, unsupervised, machine-learning process.