Some voice-controlled machines use speaker identification as a way, or part of a way, to enable a user to access information or control a system. This can be done by comparing a user's speech audio to a previously collected “voiceprint” consisting of characteristics of a user's voice that allow the system to uniquely determine the user's identity (UID).
Some voice-controlled machines use utterance classification as a way, or part of a way, to control system behavior. This is done by analyzing speech audio and classifying the utterance. Some typical categories of utterance classifications are the speaker gender; the speaker age group; the speaker accent, ethnicity, or nationality; the prosody of the utterance (such as speed, emphasis, and other vocal variations); the speaker mood; and the speaker health.
Some voice-controlled machines perform natural language processing using a grammar, which comprises rules. Some grammars group rules into domains of knowledge. A semantic parser takes as input one or more transcriptions likely to represent the words in the speech audio; processes the transcriptions using the grammar rules; and outputs one or more interpretations likely to represent the meaning of the system user's speech. Interpretations are computer data structures that represent the meaning of sentences. They represent sentence constituents and their relationships. Action modules take interpretations as input and perform appropriate actions. For example, some modules access data through web application programming interface (API) hits. Some modules actuate motors to control the movements of mechanical devices. Some modules perform communication operations, such as sending text messages. Some modules store information. Innumerable other functions are possible with appropriate action modules.
The system and methods disclosed herein provide an improved approach for generating interpretations of speech inputs.