Voice recognition systems and user devices configured to receive and respond to voice queries are becoming increasingly common. A voice query may be, for example, a spoken command to the user device to perform some action, a spoken request to view or play some particular content, a spoken request to search for certain content or information based on search criteria, or any other spoken request or command that may be uttered by a user of the user device. By removing the need to use buttons and other modes of selection, such devices may be controlled by a human operator in a hands-free manner and allow the user to issue voice queries while performing other tasks. When a user device in communication with a voice recognition engine receives a voice query from a user, the user device may be configured to send an audio file of the voice query to the voice recognition engine where it may be processed to determine the meaning of what the user uttered. Processing of the voice query may require complex automated speech recognition services that are not capable of being handled efficiently by the voice recognition engine itself. For this reason, the voice recognition engine may send the audio file to an automated speech recognition service capable of transcribing the voice query and sending a transcription of the voice query back to the voice recognition engine where a response may be generated. The automated speech recognition service may be operated by a third party. This process may occur in a manner of seconds and may be transparent to the user. However, use of the automated speech recognition service may sometimes introduce undesirable delay, possibly affecting the user experience.