Computing systems may be configured to receive multiple voice queries and to generate reply content that is responsive to each of the voice queries. The generated reply content can be provided for output to a user in an audio data format, a visual data format, or both. For example, computing systems can receive respective voice queries in which a user seeks to obtain information about a particular media/content item, meeting location, or subject matter that relates to a variety of conversational topics. Users that provide voice queries to an example client/computing device may desire to receive at least a subset of reply content in a particular data format that provides for a more convenient user experience but that may not be possible to be provided by a user device to which the user provides the voice query.