The background description provided herein is for the purpose of generally presenting the context of the disclosure. Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Modern electronic devices, including devices for presentation of content, increasingly utilize speech recognition for control. For example, a user of a device may request a search for content or playback of stored or streamed content. However, many speech recognition solutions are not well-optimized for commands relating to content consumption. As such, existing techniques may make errors when analyzing speech received from a user. In particular, while existing techniques may utilize processes and systems that have been trained with real speech, these techniques may be trained using speech recorded in very clean conditions, such as scripted newscasts or speeches. However, these techniques may not be sufficiently robust to analyze speech that is made under non-ideal conditions. For example, existing techniques may not be trained on speech made in noisy environments, speech made by children, and/or accented speech. These techniques may exhibit errors when asked to recognize speech in such scenarios.