This specification relates to speech recognition and speech understanding systems.
Speech recognition and speech processing systems are prevalent in many consumer electronic devices. Many of these electronic devices now utilize speech command processing techniques to invoke and perform particular operations. For example, a user device, such as a smart phone, can process speech commands to perform specified operations that include searching the web, setting an alarm, calling a particular person, and so on.
A user device uses a speech recognition processing system to recognize and process speech commands. A provider of the speech recognition and processing system develops parsing rules for various commands a user will speak. Upon a successful parse of a command input by a rule, the action is performed (or may be performed subject to user confirmation). The parsing rules are often in the form of a command structure for a particular action, e.g., an action n-gram, followed by preposition, followed by n-grams that defines a subject of the action, such as:
<Image_Search_Action_Term> of <Image_Subject>
The above generalized parsing rule, for example, successfully parses the following command inputs, where image search action terms include [image], [pictures] and [photos]:
Image of giraffes
Pictures of flowers
Photos of bridges
There are, however, inputs that may be parsed by a rule associated with an action, but the user may not intend for the action to be performed. For example, for the input “Picture of Dorian Gray,” the user may actually be more interested in a search of a web corpus or a book corpus for resources related to the book instead of a search of an image corpus for images of “Dorian Gray.” Accordingly, some commands may parse to an action that is different from the action the user actually desires to be performed.