As computing devices evolve, so do the ways users are able to interact with them, such as through mechanical devices (e.g., keyboards, mice, etc.), touch screens, motion, and gesture. Another way to interact with computing devices is through natural language input using speech and computer vision-based input using gestures and movements.
Some computing devices are capable of audio output and are used for playback of music and other audio content. Natural language input has made it easy for users to initiate playback of audio content on these connected devices. Additionally, content streaming services are used for audio playback by providing users with a library of audio content. In some cases, users leverage multiple connected devices as a group of devices for synchronized output of audio. Discussed herein are technological improvements for, among other things, these connected devices and systems.