1. Field of the Invention
The present invention relates to a speech recognition grammar creating apparatus that creates grammars by describing speech-recognizable words and sentences, a control method therefor, a program for implementing the method, and a storage medium storing the program.
2. Description of the Related Art
Conventionally, in describing speech-recognizable contents (speech-recognizing object) as a speech recognition grammar in advance, a speech recognition apparatus generally describes the speech recognition grammar in augmented BNF (Augmented Backus Naur Form), which is a notation for describing a sentence construction. With the speech recognition grammar described in augmented BNF, it is possible to describe a portion of the speech-recognizing object that can be omitted. However, the portion that can be omitted is for designating a certain range within the speech-recognizing object as a portion that can be omitted or not. Therefore, a single speech-recognizing object can only be described by either omitting the portion that can be omitted, or by sounding or not omitting the portion that can be omitted.
On the other hand, a speech recognition apparatus has been proposed (for example, in Japanese Laid-Open Patent Publication (Kokai) No. 2001-188560), which can describe a speech recognition grammar with extended functions to combine component elements (words), such that the order of component elements which constitute a sentence of a speech-recognizing object is not fixed, that is, can be changed. However, this proposed speech recognition apparatus does not deal with omitting component elements.
Further, although a speech recognition grammar is generally described as a text using a text editor, recently there have been used speech recognition apparatuses that graphically describe a speech recognition grammar using a GUI (Graphical User Interface). However, these speech recognition apparatuses also do not deal with combining component elements that can be omitted.
To describe a speech recognition grammar as mentioned above, it is often necessary to describe rules such that the speech recognition apparatus will function even if the user does not speak all the component elements (words) which constitute a sentence of a speech-recognizing object, by setting some kind of default values. For example, to describe a rule to express time, it is assumed here that it is desired to describe a rule which accepts not only cases in which the user of the speech recognition apparatus speaks all the component elements of the speech-recognizing object, “xx hours, xx minutes, xx seconds”, but also accepts speech such as “xx hours, xx minutes”, “xx minutes, xx seconds”, “xx hours”, “xx minutes”, and “xx seconds”, as the speech-recognizing object.
In this case, if all of the component elements are allowed to be omitted as shown below, according to the rule, it becomes possible to omit all the component elements. Thus, a rule matching a speech-recognizing object which is not sounded at all will be described.
<time>=[<hour>][<minute>][<second>]
In the above expression, contents in < > indicate non-terminal nodes (rule names), and contents in [ ] indicate elements that can be omitted.
To describe the rule precisely, it is necessary to create the following three types of combinations, and the number of combinations increases with an increase in the number of component elements.
<time>=<hr>[<min>][<sec>]|<min>[<sec>]|<sec>
In the above expression, | indicates an OR connection and the component element sequence in each combination indicates an AND connection.
To describe a rule that makes it possible to change the order of component elements including component elements that can be omitted, it is necessary to create the rule by taking into account combinations of the order of the component elements, and thus, the number of combinations of the component elements further increases.
For example, in the case of an operation task of a copying machine that is capable of carrying out various settings related to copying operations based on speech input, the operation task has designating elements such as <select paper>, <enlarge/reduce>, <single side/double side>, <sort>, <density>, <number of copies> as component elements of a speech-recognizing object. Assuming that a default setting is set for each designating element, to be applied when the speech-recognizing object corresponding to the designating element is omitted, it is necessary for the user to create all rules taking into account possible changes of the order of the designating elements, as well as possible combinations of designating elements that can be omitted. This causes a problem that the rules become tremendously complex and time consuming to create.