A wireless device, such as a speaker or wireless headset, can interact with an electronic device to play music stored at the electronic device (e.g., a mobile phone). The wireless device can also output a voice prompt to identify a triggering event detected by the wireless device. For example, the wireless device outputs a voice prompt indicating that the wireless device has connected with the electronic device. To enable output of the voice prompt, pre-recorded (e.g., pre-packaged or “native”) speech data is stored at a memory of the electronic device. Because the pre-recorded speech data is generated without knowledge of user specific information (e.g., contact names, user-configurations, etc.), providing natural-sounding and detailed voice prompts based on the pre-recorded speech data is difficult. To provide more detailed voice prompts, text-to-speech (TTS) conversion can be performed at the electronic device using a text prompt generated based on the triggering event. However, TTS conversion uses significant processing and power resources. To reduce resource consumption, TTS conversion can be offloaded to an external server. However, accessing the external server to convert each text prompt consumes power at the electronic device and uses an Internet connection each time. Additionally, quality of the Internet connection or a processing load at the server can disrupt or prevent completion of TTS conversion.