In telephony applications, text-to-speech (TTS) systems may be utilized in the production of speech output as part of an automatic dialog system. Typically during a call session, automatic dialog systems first transcribe the words communicated by a caller through a speech recognition engine. A natural language understanding (NLU) unit in communication with the speech recognition engine is used to uncover the meanings of the caller's words. These meanings may then be interpreted to determine requested information, which may be retrieved from a database by a dialog manager. The retrieved information is passed to a natural language generation (NLG) block, which forms a sentence in response to the caller. The sentence is then output, or spoken, to the caller through a speech synthesis system.
A TTS system may be utilized in many current real world applications as a part of an automatic dialog system. For example, a caller to an air travel system may communicate with a TTS system to receive air travel information, such as reservations, confirmations, schedules, etc., in the form of TTS generated speech.
A well known phenomenon in human-to-human communication is the “Lombard effect,” in which a speaker will increase his volume and articulate more carefully when the conversation is taking place in a noisy environment, in order to increase the amount of information received by the listener. In contrast, automatic dialog systems produce or output speech in the same manner for a given text, independent of an ambient noise level.
Therefore, it is desirable for an automatic dialog system to act similar to a human speaker and adjust characteristics of the outgoing speech, for example, through increased volume and careful articulation, according to the environment or context of the incoming communication. Contextual variables such as the time of day, the date, characteristics of the listener, the location where the speech is to be heard, etc., may assist in shaping desired characteristics of the speech produced by a TTS engine of an automatic dialog system. Currently, a dialog system does not exist having the ability, or sophistication, to adapt its output in accordance with the context in which communication is taking place.