Intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices. Such assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can provide a speech input containing a user request to a digital assistant operating on an electronic device. The digital assistant can interpret the user's intent from the speech input and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more services of the electronic device, and a relevant output responsive to the user request can be returned to the user.
A digital assistant can be used to obtain media items based on user's speech inputs. For example, in an attempt to get a song, a user may say “Play Skrrt Skrrt by 21 Savage.” Due to a number of reasons (e.g., the similarity of pronunciations, the similarity of names of available, media items, lack of context information for intent inference), the user intent interpretation of a speech input for obtaining a media item may be difficult and inaccurate. Inaccurate user intent interpretation may cause the digital assistant to obtain an incorrect media item or fail to obtain any media item. In the above example, the digital assistant may fail to obtain a correct media item if the digital assistant interprets the user input as “Skirt by Kodak Black,” “Skrrt by Kodak Black,” or “Skrt Skrt by 21 Savage.” In another example, the user may want an album named “Candyman by Zedd,” and the digital assistant may erroneously interpret as “Candy Man by Zedd,” which may fail to identify and return the correct album.