As electronic devices become increasingly sophisticated, people are using such devices in new and interesting ways. Some of these devices have adopted voice control where the device can perform various actions in response to a spoken question or instruction. For example, in response to a question or instruction, these devices can provide information, music, audiobooks, news, weather, traffic, sports, control connected devices, etc. In various situations involving media content, the user may want to refine or otherwise control the playback of the media content using spoken instructions. Conventional approaches typically enable the user to control basic navigation and selection of media content. However, the lack of control can be frustrating to some users and in some instances negatively affect the overall user experience associated with using computing devices to manage media content.