The use of voice searches and voice-enabled devices is rapidly expanding. Currently, many network-connected digital devices, such as smartphones, smart watches, smart in-home appliances, and so forth use voice-enabled assistants. A voice-enabled assistant may include a software agent that uses voice recognition technology to perform tasks or services for a user. A conventional voice-enabled assistant agent running on a digital device can receive a voice query from the user, convert the voice query into text, and perform a command based on the text (for example, perform an action on the digital device or facilitate a search on the Internet).
A voice query typically entails providing text to a third-party voice recognition system that analyzes the text and conducts the search based on the analysis of the text. The voice-enabled assistant receives results from the third-party voice recognition system and provides the results to the user. However, the selected third-party voice recognition system may not be the best third-party voice recognition system in general or not the optimal fit for a specific query. Thus, existing solutions may lack the ability to automatically select an optimal third-party voice recognition system.
Furthermore, the third-party search engines can perform semantic analysis of the text; however, various parameters associated with the voice query can be left out of consideration. Such parameters can include emotions expressed by the user when pronouncing the voice query, urgency of the request, environmental sounds present in the voice query, an emphasis made by the user when uttering a phrase, and so forth. In the absence of an analysis of the whole range of parameters of the voice query, the search results may be incomplete or insufficiently relevant to the voice query.