N/A
N/A
The present invention relates generally to speech recognition, and more specifically to a system for providing actions to be performed during processing of recognition results.
Speech recognition systems are available today which allow a computer system user to communicate with an application computer program using spoken commands. In order to perform command and control operations on the application program in response to a users speech, existing speech recognition systems employ a type of speech recognition grammar referred to as a xe2x80x9crule grammarxe2x80x9d, which is loaded into a speech recognizer program. Rule grammars are also sometimes referred to as xe2x80x9ccommand and controlxe2x80x9d or xe2x80x9cregularxe2x80x9d grammars. Rule grammars are often written in a special grammar language that is different from the programming language used to write the application program itself. For example, in a system in which the application program is written in the Java(trademark) programming language provided by Sun Microsystems(trademark), the Java Speech API grammar format (JSGF) may be used to write the rule grammar. Accordingly, application programs and grammars are typically developed and maintained in separate files by different programmers or teams. As a result, the application code that handles the logic of the application is usually separate from the files that define the rule grammars. These factors result in a parallel maintenance problem: changes to the application code may require changes to the rule grammar files and visa versa.
An example of a simple rule grammar for an application program associated with a hypothetical multi-media player is as follows:
 less than play greater than =(play|go|start){PLAY};
 less than stop greater than =(stop|halt|quit running){STOP};
 less than lineno greater than =1{ONE}|2{TWO}|3{THREE}|4{FOUR};
 less than goto greater than =go to line less than lineno greater than {GOTO};
public  less than command greater than = less than play greater than | less than stop greater than | less than goto greater than ;
In the above illustrative grammar, each line is a recognition rule having a rule name within  less than   greater than  on the left side of the equation, specifically  less than play greater than ,  less than stop greater than ,  less than lineno greater than ,  less than goto greater than , and  less than command greater than . Rule definitions are on the right side of the equations. In the above example, the rule definitions include a set of one or more alternative utterances (sometimes referred to as xe2x80x9ctokensxe2x80x9d) or rule names separated by xe2x80x9c|xe2x80x9d. The utterances and rule names in each rule definition define the speech patterns matching the rule.
The above rule grammar may be loaded into a speech recognizer program to enable the speech recognizer program to listen for any of the following spoken commands: xe2x80x9cplayxe2x80x9d, xe2x80x9cgoxe2x80x9d, xe2x80x9cstartxe2x80x9d, xe2x80x9cstopxe2x80x9d, xe2x80x9chaltxe2x80x9d, xe2x80x9cquit runningxe2x80x9d, xe2x80x9cgo to line 1xe2x80x9d, xe2x80x9cgo to line 2xe2x80x9d, xe2x80x9cgo to line 3xe2x80x9d, and xe2x80x9cgo to line 4xe2x80x9d. In existing systems, for an application program to respond to the above commands, the application program must include program logic mapping the specific speech patterns defined by the rule grammar to the appropriate actions. This is sometimes accomplished by embedding static strings, referred to as xe2x80x9ctagsxe2x80x9d, in the rule grammar, which are passed to the application program within recognition results from the speech recognizer program. In the above illustrative rule grammar, the tags associated with each rule are shown within curly brackets {}, specifically xe2x80x9c{PLAY}xe2x80x9d, xe2x80x9c{STOP}xe2x80x9d, xe2x80x9c{ONE}xe2x80x9d, xe2x80x9c{TWO}xe2x80x9d, xe2x80x9c{THREE}xe2x80x9d, xe2x80x9c{FOUR}xe2x80x9d, and xe2x80x9c{GOTO}xe2x80x9c.
When the speech recognizer program recognizes a command defined by the rule grammar, the speech recognizer program sends the application program recognition result information describing what was spoken by the user. Result information is passed to the application program in what is sometimes referred to as a xe2x80x9crecognition result object.xe2x80x9d The result information in the recognition result object includes any tag or tags associated with the grammar rule or rules matching what was spoken by the user. The application program then must determine what action or actions are to be performed in response to what was spoken by the user by interpreting the tags in the recognition result.
In more sophisticated existing systems, tags embedded in rule grammars may include or consist of portions of scripting language. For example, a more elaborate rule grammar for a hypothetical multi-media player application program might include the following rules:
 less than lineno greater than =(1|2|3|4) {line=this.tokens;};
 less than goto greater than =go to line  less than lineno greater than {action=goto;
lineno= less than lineno greater than .line;};
 less than play greater than =(play|go|start) {action=play;};
 less than stop greater than =(stop|halt|quit|running) {action=stop;};
public  less than command greater than = less than play greater than | less than stop greater than | less than goto greater than ;
In the above example, the tag for the  less than lineno greater than  rule is xe2x80x9cline=this.tokens;xe2x80x9d, which is a scripting language command for assigning the value of the number that was spoken (1, 2, 3 or 4) to a xe2x80x9clinexe2x80x9d feature field within a feature/value table. Similarly, the tag for the  less than goto greater than  rule in the above rule grammar is the scripting language xe2x80x9caction=goto; lineno= less than lineno greater than .line;xe2x80x9d. When a user says xe2x80x9cgo to line 3xe2x80x9d, the speech recognizer generates a recognition result including the tags xe2x80x9cline=this.tokensxe2x80x9d and xe2x80x9caction=goto; lineno= less than lineno greater than .line;xe2x80x9d. The application program receives the recognition result, and, for example, passes the tags it contains to a tags parser program for interpretation of the scripting language they contain. The application program may, for example, pass the result object to the tags parser program using the following command:
FeatureValueTable fv=TagsParser.parseResult(recognitionResult);
The above command loads the result from the tag parser program (TagsParser.parseResult), operating on the above tags for example, into the feature/value table xe2x80x9cfvxe2x80x9d. In this example, the tags parser would first associate the value xe2x80x9c3xe2x80x9d with the xe2x80x9clinexe2x80x9d field. The tags parser would then associate the value xe2x80x9cgotoxe2x80x9d with the xe2x80x9cactionxe2x80x9d feature and copy the value xe2x80x9c3xe2x80x9d from the xe2x80x9clinexe2x80x9d field to the xe2x80x9clinenoxe2x80x9d feature. This results in logical feature/value pairs stored in the feature/value table fv as follows:
Upon receipt of the feature/value table, the application program must employ specialized post-processing code to interpret the feature/value pairs it contains. An example of such post-processing is as follows:
public void interpretResult(RecognitionResult recognitionResult) {
FeatureValueTable fv=TagsParser.parseResult(recognitionResult);
String action=fv.getvalue (xe2x80x9cactionxe2x80x9d);
if (action.equals(xe2x80x9cgotoxe2x80x9d)) {
String lineno=fv.getvalue (xe2x80x9clinenoxe2x80x9d);
int line=Integer.parseInt(lineno);
player.goto(line);
else if (action.equals(xe2x80x9cplayxe2x80x9d)) {
xe2x80x83player.play( )
}. . .
}
The above code example illustrates the complexity required in the application program to process the feature/value table. Some existing scripting language systems permit class references to be embedded in the scripting language of a tag. Class references are references to globally defined xe2x80x9cstaticxe2x80x9d objects as may be obtained using the xe2x80x9cstaticxe2x80x9d key word in the Java programming language. The following rule definition shows such an approach:
 less than lineno greater than =(1 |2|3|4) {line =HelperClass.parseInt(this.tokens);};
The above recognition rule contains a class reference to a static object named xe2x80x9cHelperClassxe2x80x9d. The class reference performs, for example, conversion of the string spoken by the user to an integer value. However, the only permitted references in such systems are class references, and object instance references are not supported.
As discussed above, existing systems suffer from various shortcomings. Specifically, they require cumbersome application program logic to interpret the feature/value pairs received from a tags parser or its equivalent while processing recognition results. Moreover, existing systems impose an undesirable separation between the description of rule results in the rule grammar tag definitions and the handling of such results by the application program.
It would therefore be desirable to have a speech recognition system which reduces the specialized post-processing program logic in an application program that is typically needed to process speech recognition results. The system should also lessen the separation between rule result definition and rule result processing found in existing systems.
A system and method are disclosed for referencing object instances of an application program, and invoking methods on those object instances from within a recognition grammar. The disclosed system also enables application program object instances to be created by the recognition grammar. A mapping is maintained between at least one string formed using characters in the character set of the recognition grammar and instances of objects in the application program. The mappings may be formed either by the application program or by script within the recognition grammar. In an illustrative embodiment, the mappings are maintained in a table, such that an index of a table entry containing a reference to an object instance associated with a given string may be obtained by applying a predetermined function to the string. Alternatively, any other appropriate technique may be used to ensure that the strings are uniquely associated with references to the corresponding object instances.
During operation of the disclosed system, when an object instance is created by either the application program or the recognition grammar, the application program or the recognition grammar may add a reference to the object to the mapping table, together with an associated unique string. The unique string may then be used within scripting language in tags of a rule grammar, in order to refer to the object instance that has been xe2x80x9cregisteredxe2x80x9d in this way by the application program. In one embodiment, the mapping table is included in a tags parser utility program that is used to interpret these object instance xe2x80x9cnamesxe2x80x9d while parsing the scripting language contained in tags included in a result object. The tags parser program calls methods on such application object instances directly, eliminating the need for the application program to process a feature/value table to determine which methods are to be invoked on which object instances.
With reference to the above media-player application program example, employing an embodiment of the disclosed system allows a unique string xe2x80x9cplayerNamexe2x80x9d to be registered by the application program into the mapping table, for example in association with an instance of the multi-media player application program itself. The application program may then be referenced directly from scripting language within the tags defined by the rule grammar. A portion of the rule grammar for such an illustrative embodiment is shown below:
 less than lineno greater than =(1|2|3|4) {line =HelperClass.parseInt(this.tokens);};
 less than goto greater than =go to line  less than lineno greater than { playerName.goto( less than lineno greater than .line);};
 less than play greater than  (play|go|start) {playerName.play( );};
 less than stop greater than  (stop|halt|quit running){playerName.stop( );};
public  less than command greater than = less than play greater than | less than stop greater than | less than goto greater than ;
In this illustrative embodiment, a tags parser program is invoked to interpret the tags in a recognition result matching one of the above rules. Accordingly, processing of recognition results in the application program may be simplified to an invocation of the tags parser using such as:
public void interpretResult(RecognitionResult recognitionResult) {
TagsParser.parseResult(recognitionResult);
}
In this way the disclosed system eliminates the need for extensive, costly post-processing of a feature/value table by an application program. Interpretation of tags and the calling of the methods invoked by scripting language contained within the tags may both be done directly by tag parsing code. Since the grammar itself can be used to define actions performed in connection with specific recognized speech, the overall program environment is easier to develop and maintain. Additionally, since application program object instances can be referenced from within the grammar, multiple, simultaneously active instances of a single object class can conveniently be handled.
Thus the disclosed system reduces or eliminates specialized post-processing application program logic used by existing systems to process speech recognition results. The disclosed system also lessens the separation between rule definition and rule result processing found in existing systems.