This specification relates to speech recognition.
As used by this specification, a “search query” includes one or more query terms that a user submits to a search engine when the user requests the search engine to execute a search query, where a “term” or a “query term” includes one or more whole or partial words, characters, or strings of characters. Among other things, a “result” (or a “search result”) of the search query includes a Uniform Resource Identifier (URI) that references a resource that the search engine determines to be responsive to the search query. The search result may include other things, such as a title, preview image, user rating, map or directions, description of the corresponding resource, or a snippet of text that has been automatically or manually extracted from, or otherwise associated with, the corresponding resource.
Among other approaches, a user may enter query terms of a search query by typing on a keyboard or, in the context of a voice query, by speaking the query terms into a microphone of a mobile device. When submitting a voice query, the microphone of the mobile device may record ambient noises or sounds, or “environmental audio,” in addition to spoken utterances of the user. For example, environmental audio may include background chatter or babble of other people situated around the user, or noises generated by nature (e.g., dogs barking) or man-made objects (e.g., office, airport, or road noise, or construction activity). The environmental audio may partially obscure the voice of the user, making it difficult for an automated speech recognition (“ASR”) engine to accurately recognize spoken utterances.