Humans naturally use speech to communicate with other people, and also to communicate with machines (such as mobile computing devices and appliances). A speech signal as used herein generally refers to an audio signal that contains sounds and/or speech made by one or more human person(s). An audio signal as used herein generally refers to an acoustic signal, which can be captured or otherwise received by a sound capture device, such as a microphone. The received acoustic signal may be converted to a digital form by, e.g., an analog-to-digital converter (ADC) or other suitable device or component. The digital form of the acoustic signal may be stored in non-transitory media, e.g. computer memory or a data storage device. Algorithms implemented with computer technology can extract various information from the speech signal. The extracted information may include the words that are spoken by the speaker, non-word sounds (such as grunts, sighs, laughter, etc.) and/or information about how those words (or other non-word sounds) are produced by the speaker (e.g., intonation, loudness, speaking rate, timing, etc.).