There have been advances on hyperarticulation detection in the field of audio signal processing. In general, hyperarticulation is an occurrence where a speaker emphasizes particular syllables, words, and/or phrases in a spoken sentence. Hyperarticulation can indicate frustration with a particular activity or it can be a means in assisting a listener for discerning the syllables, words, and/or phrases in the spoken sentence.
With the increase in the number of applications for automatic speech recognition (ASR), understanding meta-information in a speaker's voice, rather than just the spoken words, is important. Generally, meta-information may include the volume at which the speaker is speaking, the cadence of the spoken words, the emphasis (e.g., hyperarticulation) on particular words and/or phrases, changes in pitch, the prosody of the speech, and other such meta-information.
A typical application of ASR is a voice-enabled personal assistant. A voice-enabled personal assistant may be software-implemented and configured to execute within the context of an operating system. A voice-enabled personal assistant can perform a variety of tasks relating to applications within the operating system or of relating to the operating system itself, such as web-search, command and control, navigation, and other such tasks. In addition, a voice-enabled personal assistant may be implemented on different types of devices, from mobile phones to desktop computers.
In using a voice-enabled personal assistant, a user may exhibit the behavior of query reformulation. When users are not satisfied with the results shown by the personal assistants, they tend to repeat or paraphrase their queries in order to get better results. There could be multiple reasons leading to the reformulation. When a user reformulates the query, he or she may engage in hyperarticulation of one or more words and/or phrases from the initial query. Hyperarticulation detection is a challenging task because a user's normal speaking style is not generally known before he or she presents the initial query; thus, not knowing the user's normal speaking style makes it difficult to distinguish between the user's normal speaking style and the hyperarticulation by the user.