Intelligent automated assistants (or virtual assistants) provide an intuitive interface between users and electronic devices. These assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can access the services of an electronic device by providing a spoken user input in natural language form to a virtual assistant associated with the electronic device. The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more functions of the electronic device, and a relevant output can be returned to the user in natural language form.
A common problem in natural language processing is that user inputs containing frequently used words, such as “it,” “her,” “no,” “yes,” and the like, can be difficult to interpret because those words can be determined to match a large number of potential results. For example, a search query that includes the word “it” can produce a long list of results, many of which are irrelevant, because “it” can be determined to match nearly any searched item. To avoid the problems associated with frequently used words, natural language processing systems typically use a “stop list” to identify words that have been determined to produce a large number of irrelevant results. Any word of a user input that appears in the stop list can be dropped or ignored, and the remaining words of the input can then be processed.
While stop lists can be used to effectively reduce the number of irrelevant results, they can sometimes prevent natural language processing systems from using highly relevant information from the user input. For example, one or more words of a named entity, such as a song having the name “Her,” may also appear as a frequently used word in the stop list. In this example, if a user asks a virtual assistant the question “Who sang Her?”, the virtual assistant would drop the name of the song (“Her”) from the search query, and would be unable provide the user with the desired information.
A similar problem can also occur in virtual assistant systems that categorize information into domains (e.g., categories of information that can represent a subject, genre, area of interest, or the like) to limit the scope of information used to process a user input. In these systems, when a user input is received by the virtual assistant, the user input can be analyzed to identify the domain that is most likely to be relevant to the user input. The user input can then be processed within that domain. If the virtual assistant includes a frequently used word in the domain identification process, the user input can be determined to be related to a large number of irrelevant domains. Alternatively, if a stop list is used to eliminate certain words from the user input, the virtual assistant may be unable to identify the correct domain, and/or may be unable to identify relevant information within a selected domain.