Many database programs include user interface software and programming tools for defining data entry forms, and for linking fields in those data entry forms to fields in database tables. A related application, System and Method for Generating Database Input Forms, U.S. Ser. No. 08/328,362, filed Oct. 25, 1994, teaches a system and method for converting an existing non-computerized (i.e., paper) data entry form into a computer based data entry form that uses speech recognition for verbal data entry application Ser. No. 08/328,362, pending is hereby incorporated by reference.
The present invention provides a tool for automatic generation of the speech input part of an application program without requiring the application developer to know anything about speech recognition systems. The application developer provides only a set of "menu files" listing the longest word sequence for identifying each predefined input to the program, and the present invention then generates all the syntax and dictionary files needed to enable the developer's application program to be used with a syntax based speech recognition system.
Phoneme based speech recognition systems (also called extendable vocabulary speech recognition systems) are considered desirable because they are speaker independent: there is no need to train the speech recognition system to learn each user's voice patterns. Syntax based speech recognition systems are speech recognition systems that define a set of alternate verbal inputs for each predefined multiple word input value. The set of alternate verbal inputs accepted as matching a particular multiple word input value is defined by a "syntax rule," sometimes called a syntax statement. Syntax based speech recognition systems are usually also phoneme based speech recognition systems, although it would be possible to have a syntax based speech recognition system that is not phoneme based. The preferred embodiment of the present invention uses phoneme based word recognition and syntax based word sequence recognition.
In a typical application of a speech recognition system, there will be either one speech input context for an entire associated application program, or there will be multiple contexts, such as one for each pull down menu of the application program and one for each special dialog box used by the application program. Alternately, in a data entry context, each defined region of a data entry form can be defined as a separate context. Each context, whether in the application program or data entry form, will typically include a set of global commands (such as "save file," "help," or "exit program") as well as a set of navigation commands (such as "tools menu") for switching to another context.
Within any given context, it is desirable that the speech recognition system be as flexible as possible as to the set of words the user can speak to identify each predefined input value, while still uniquely identifying that predefined input value. In the past, this has been accomplished by a person, typically a computer programmer, manually generating a syntax statement for each predefined input value, where the syntax statement defines all word sequences that will be accepted as identifying that predefined input value. For example, a syntax statement for the input value "move to back" may be of the form:
TAG23.fwdarw.move back.vertline.move to back PA1 TAG23.fwdarw.move (to) back PA1 TAG23.fwdarw.(move) (to) back
or
where the symbol ".vertline." is the logical OR operator and parentheses indicate that the word "to" is optional. However, although the author of the above statement may not have thought of it, in the context of this predefined input value, the word "back" might be sufficient to uniquely identify it. In other words, the syntax statement should probably read:
indicating that both the word "move" and the word "to" are optional.
When a predefined input value has more than the three words of the above example, defining the optimal syntax for the input value gets considerably more complex. For instance, for a predefined input value having seven words, there are 127 potential word sequences that maintain the same word order as the original sequence and that might acceptably and uniquely identify that input value.
Of course, the sequences of words that uniquely identify an input value depend on the other predefined input values within the same defined context. Thus, if the same context that included the input value "move to back" included the input value "back one step", then the word "back" could not be used to uniquely identify either of those input values. In contexts with even ten or so predefined input values, checking for all possible conflicts between word sequences can be difficult to do properly. In contexts with several dozen or more input values, this task is extremely difficult for a human to perform manually without spending inordinate amounts of time on the task. An example of where such large contexts can arise are applications where the application designer has decided to make as many commands as possible available while minimizing verbal navigation command requirements.
It is common for many data entry forms, and for many application programs, to use abbreviations, numbers, ordinals, acronyms and initialisms to identify predefined input values and commands. The corresponding syntax statements for a speech recognition system must include equivalent word sequences that correspond to the standard verbalizations of such abbreviations, numbers, ordinals, acronyms and initialisms.
The difference between an acronym and an initialism is as follows. An acronym is formed from pronounceable syllables, even if it represents a "made up word," while an initialism is not pronounceable except as a sequence of letters and numbers. Thus, "IBM" and "YMCA" are initialisms, while "NASA" and "UNICEF" are acronyms. Some words, such as "MSDOS" and "PCNET" are a mix of acronym and initialism components.
In order for a speech recognition system to work with such data entry forms and application programs, the syntax statements defining the range of word sequences for each predefined input value must include equivalent verbal word sequences. For initialisms, this means the user will need to speak the same or substantially the same sequence of letters as found in the initialism. For an acronym, the user will need to speak the equivalent verbalization of the acronym, and thus the corresponding syntax statement will need to accurately reflect the equivalent verbalization. For an abbreviation, the corresponding syntax statement will need to include the corresponding full word. For numbers and ordinals, the syntax statement will need to include the equivalent full text words.
It is therefore a goal of the present invention to provide a speech recognition system in which syntax statements for all predefined input values for all contexts are automatically generated.
Another goal of the present invention is for the automatically generated syntax statement to provide maximal flexibility in terms of word sequences accepted as identifiers for each predefined input value while still providing a unique syntax identification for each predefined input value in each defined context.