The present invention relates to a speech processing system which subjects input speech or input words to speech recognition and outputs various operation instructions on the basis of a result of the speech recognition.
Many systems have conventionally been available which input operation instructions or the like to various units through a speech recognition technology. Such a conventional system is equipped with a speech processing system that subjects speech or words, which are spoken by a user in order to input an operation instruction or the like, to a speech recognition process to thereby specify both a unit to be operated and an operation to be performed and that outputs an operation instruction corresponding thereto. Taking a vehicle system comprising various vehicle units, such as a navigation unit, an audio unit and an air conditioner, as an example, the above-mentioned speech processing system can give an operation instruction by means of speech to such units, thereby allowing the user to operate them with ease, particularly when his body is shaken during driving.
In the above-mentioned conventional speech processing system, when an operation instruction is given in the form of speech, it is required to clearly specify a unit to be operated and an operation to be performed. For example, when the user wishes to find a restaurant close to the current vehicle position in the above-mentioned example of the vehicle system, the user inputs words instructing to “find a restaurant” or the like, so that the speech processing system specifies from the word “restaurant” the navigation unit as the unit to be operated and from the word “find” an operation of finding a restaurant as the specific operation to be performed, and outputs an operation instruction corresponding thereto to the navigation unit.
However, a person who entertains some kind of feeling, desire or the like generally often speaks first a phrase that honestly expresses such a feeling, desire or the like. This is understood from words spoken by a child who has only a small vocabulary. When he is hungry, he says, first of all, “I'm hungry” which directly expresses his desire, rather than “I want to eat something” or “I want to go to a restaurant” which indicate more specific objects. However, the conventional speech processing system cannot specify a unit to be operated and an operation to be performed from a phrase expressing a feeling, desire or the like, and thus the user has to memorize a predetermined phrase necessary to give a desired operation instruction. Hence, the conventional speech processing system has not necessarily been a user-friendly machine.
To respond to the above-mentioned phrases honestly expressing a feeling, desire or the like in the conventional speech processing system, a table of correspondence between such phrases and specific operation instructions must be stored. For example, the phrase “I'm hungry” should be made to correspond to an operation instruction to “find a restaurant,” a phrase “I feel hot” should be made to correspond to an operation instruction to “reduce the temperature setting of the air conditioner,” and a phrase “I'm tired” should be made to correspond to an operation instruction to “find a rest area.” That is, as long as phrases are made to correspond to operation instructions on a one-to-one basis, even the conventional speech processing system can respond to phrases honestly expressing a feeling, desire or the like. However, taking the phrase “I'm hungry” as an example, when the user speaks this phrase, it may imply that he wishes to find a nearby convenience store to buy something to eat right away, or that he wishes to find a good restaurant to dine a little later since he can still control his hunger. However, as mentioned above, when phrases are made to correspond to operation instructions on a one-to-one basis, the operation instruction to “find a restaurant” is always specified for the phrase “I'm hungry” with no possibility of giving an operation instruction to “find a nearby convenience store.” That is, the conventional speech processing system cannot determine a user's request level from a phrase spoken by the user to give an operation instruction corresponding to such a determined request level.