The ability to effectively communicate information using speech remains a challenge for many individuals. One problem may be with speech disfluencies. A speech disfluency is any of various breaks, irregularities, or non-lexical vocables that occurs within the flow of otherwise fluent speech. These include false starts, i.e. words and sentences that are cut off mid-utterance, phrases that are restarted or repeated and repeated syllables, fillers, i.e. grunts or non-lexical utterances such as “huh”, “uh”, “erm” and “well”, and repaired utterances, i.e. instances of speakers correcting their own slips of the tongue or mispronunciations. In addition to speech disfluencies, other problems, such as interrupting, fast talking, mumbling, and shouting, can result in consequences that can have a long term effect on a person's career or personal life. As more work is done remotely, e.g., via conference calls, on-line broadcasts, etc., the ability to effectively speak becomes even more important as visual body language cues are removed from the communication process. Instead, listeners concentrate their focus on the speaker's voice, grammar, and audible style.
Current methods of teaching public speaking, teaching proper speaking, or improving speech problems require humans to detect problems and provide all of the feedback. Typically, speech training is done in person, where the speech is often not recorded or analyzed specifically to review later. Even if the speech is recorded or notes are documented, feedback is typically obtained through a manual process, and the feedback is not typically linked directly with different portions of speech. Real-time analysis is more difficult to provide, as feedback is typically not available without interrupting the speaker.
U.S. Pat. No. 8,595,015 B2 by Lee, et al., describes a device for audio communication assessment. The device includes a communication interface configured to receive audio signals associated with audible communications from a user, an output device, and logic. The logic is configured to determine one or more audio qualities associated with the audio signals, map the one or more audio qualities to at least one value, generate audio-related information based on the mapping, and provide, via the output device during the audible communications, the audio-related information to the user.