1. Field of the Invention
The present invention relates to speech recognition and, more particularly, to an apparatus and method for identifying command boundaries from natural conversational speech.
2. Description of the Related Art
Natural language user interface systems includes systems which permit a speaker to input commands to the system by saying the commands. However, state-of-the-art conversational natural language user interface systems typically require the user to indicate the end of a command, or the command boundary, through some form of manual input, such as pausing between commands or clicking a microphone control button on the display. Such a requirement makes the user interface quite cumbersome to use and may result in unwanted delays.
Therefore, a need exists for a trainable system that can automatically identify command boundaries in a conversational natural language user interface.
An apparatus for automatically identifying command boundaries in a conversational natural language system, in accordance with the present invention, includes a speech recognizer for converting an input signal to recognized text and a boundary identifier coupled to the speech recognizer for receiving the recognized text and determining if a command is present in the recognized text, the boundary identifier outputting the command if present in the recognized text.
In alternate embodiments, the boundary identifier may output to an application which executes the command. The boundary identifier may include an input processor for processing the recognized text. The input processor may process the recognized text by augmenting each word in the recognized text by the word""s relative position with respect to a hypothesized command boundary. The boundary identifier may further include a feature detector coupled to the input processor, the feature detector for determining which feature functions, from a set of feature functions, are present in the processed recognized text. The boundary identifier may further include a decision maker for determining if a command is present in the processed recognized text according to a set of feature weights corresponding to the feature functions in the processed recognized text. The decision maker may be coupled to the feature detector and may decide if the processed recognized text includes a command boundary.
In still other embodiments, a training system for training the apparatus to recognize text and to recognize complete commands may be included. The training system may include an input processor for processing a collection of training data comprising utterances which include complete commands and other than complete commands. The input processor may insert a token before each utterance in the training data. The input processor may insert a token before a first utterance in the recognized text, and after every command in the recognized text. A feature extractor may be included for extracting feature functions including words and relative positions of the words with respect to a hypothesized command boundary location. The speech recognizer may include a language model that has been trained using training data, the training data including a token inserted to indicate a location of a command boundary in the training data. The speech recognizer may include additional baseforms for the token. The speech recognizer may produce the recognized text including the token. The boundary identifier may declare a command boundary when there is an extended period of silence in the recognized text.
A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for identifying commands in recognized text, the method steps include inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command.
In alternate embodiments, a program of instructions for training the program storage device by inputting training data including utterances comprising commands and other than commands may be included. The steps of placing a token before each utterance may be included. The step of placing a token after each command boundary included in the utterances may also be included. The program of instructions for training the program storage device may includes the step of extracting feature functions from the training data. The program of instructions for training the program storage device may include the step of determining feature weights for all feature functions. The program of instructions for processing the recognized text may include the step of placing a token before a first utterance in the recognized text and after each command in the recognized text. The program storage device may further include a speech recognizer for providing the recognized text.
A method for identifying commands in natural conversational language includes the steps of inputting recognized text, processing the recognized text by augmenting words of the recognized text with a position relative to a hypothesized command boundary, determining feature functions in the processed recognized text in accordance with a set of feature functions, deciding whether the processed recognized text with feature functions identified includes a command, the decision being made base on weighting of feature functions and if a command is included, outputting the command.
In other methods, the step of inputting training data including utterances comprising commands and other than commands may be included. The steps of placing a token before each utterance of the training data may also be included. The method may further include the step of placing a token after command boundaries included in the utterances. The method may include the step of extracting feature functions from the training data. The method may further includes the step of determining feature weights for all feature functions. The step of placing a token before a first utterance in the recognized text and after each command in the recognized text may be included. The step of outputting the command to a device for executing the command includes a speech recognizer for providing the recognized text may also be included.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.