Voice-based client devices can be placed in a home, office, or other environment and can transform the environment into a speech-enabled environment. In a speech enabled-environment, a user can speak a query or command to prompt the voice-based client to generate an answer or to perform another operation in accordance with the user's query or command. In order to prevent a voice-based client from picking up all utterances made in a speech-enabled environment, the client may be configured to activate only when a pre-defined hotword is detected in the environment. A hotword, which is also referred to as an “attention word” or “voice action initiation command,” is generally a predetermined word or term that is spoken to invoke the attention of a system. When the system detects that the user has spoken a hotword, the system can enter a ready state for receiving further voice queries.