Many computing devices, such as smartphones, desktops, laptops, tablets, game consoles, televisions, and the like, include functionality for receiving an input (e.g., voice input) for identifying and selecting items displayed on a screen. For example, a user interacting with an entertainment search application executing on a computing device may wish to request the display of movie titles which share a common theme (e.g., HARRY POTTER movies) or a list of restaurants sharing a common attribute (e.g., middle eastern cuisine). Current applications however, focus on rule-based grammars that cover a very strict set of language constructs comprising a limited number of acceptable commands. Thus, the user often does not know which commands would work (i.e., what the application can handle) and which will not, leading to a time-consuming trial and error approach. It is with respect to these considerations and others that the various embodiments of the present invention have been made.