In addition to playing back requested media content, users sometimes desire to sing along with the media being played. Users may, for example, wish to overlay a music track with their own vocals by singing into a microphone as the music plays. A system that provides this functionality typically consists of a wired microphone physically plugged into a device that only plays locally stored content. Consequently, typical systems substantially encumber the ability of users to select and control media content for playback, and to easily provide their vocals for overlaying with media content.