Users may consume audio content via a number of content consumption devices. Certain content consumption devices may be configured to receive voice-based commands, or may otherwise be configured to recognize speech. Such devices, however, may lack an ability to determine an intent or a meaning of certain speech. As a result, voice interaction with certain content consumption devices may be limited. For example, certain devices may include voice assistants that reply to user speech. However, such replies may be irrelevant to a meaning of the speech.