1. Field of Invention
This invention relates to a method of notifying a speaker of whether a sound of the speaker is input in an appropriate state when the sound spoken by the speaker is recognized. The invention also relates to a sound recognition device that uses this method, and to a recording medium on which is recorded a processing program that identifies an input state of sound to be recognized.
2. Description of Related Art
Recently, sound recognition technology has been widely used in various fields. In particular, it has been recently used for children""s toys and household electrical appliances which have become daily necessities.
If sound recognition technology is used for a device used by a variety of non-specific users, rather than a specific user, in order to recognize sounds spoken by the users at high reliability, it is important to guide the users in the use of the device, such as how to properly input sound, in an easy-to-understand manner, and thus provide an easy-to-use device.
For example, as one device which uses sound recognition which is targeted for a variety of users, a so-called sound clock has been recently developed. That is, when a button or the like disposed on the clock is pressed, a sound informs the user of a current time.
This sound clock is convenient because it is possible to find out the current time in the dark. For example, when a user wakes up in the middle of the night, he/she can find out the current time while in the dark. Furthermore, those who are blind can take advantage of this device. In addition, it is also possible to apply this type of sound clock to children""s toys.
In this type of sound clock, setting a current time and alarm can be performed by sound, in addition to outputting the time by sound. For example, if the current time is 6:30 a.m., the user speaks the necessary words in a determined order, such as xe2x80x9ca.m.xe2x80x9d, xe2x80x9c6xe2x80x9d, and xe2x80x9c30xe2x80x9d by using a sound clock in a current time setting mode. In addition, on the sound clock side, the sound spoken by the user is recognized, and based upon the recognition result, the time setting process is performed. Setting an alarm time can be performed in the same manner, and the user speaks a desired alarm time in an alarm time setting mode.
While time setting can be performed by this type of operation, the user may have a concern as to whether the sound spoken by himself/herself has been input in an appropriate state (an appropriate state for a recognition process).
In order to solve this problem, there are methods such that a sound can be input while a recognition result for the word is responded per word spoken by a user. For example, in the example of the content spoken by the user described earlier, the user speaks xe2x80x9ca.m.xe2x80x9d and a response such as xe2x80x9ca.m.xe2x80x9d is returned from a device as the recognition result. Next, when the user speaks xe2x80x9c6xe2x80x9d, a response such as xe2x80x9c6xe2x80x9d is returned from the device. Furthermore, when the user speaks xe2x80x9c30xe2x80x9d, an operation is performed such that a response such as xe2x80x9c30xe2x80x9d is returned from the device. In addition, in this case, when the sound spoken by the user is inappropriate and the sound is not recognized, an operation can be performed such that a response from the device side is not created, and/or a response such as xe2x80x9cplease speak againxe2x80x9d is performed.
Thus, when the recognition result cannot be responded per word spoken by the user and the sound is not recognized, if an operation is performed such that some response is returned, the user can find out whether the content spoken by himself/herself is not appropriate, and how the sound is recognized, so that the user feels relieved and can easily use the device.
However, as described earlier, if the sound is recognized per word and responded to the user, if one setting operation such as time setting is performed, this time-consuming operation can create problems. Furthermore, if this type of sound recognition technology is applied to a device which requires low cost, such as daily necessities and toys, it is necessary to reduce the cost as much as possible, so there are significant restrictions on processing ability of a CPU and on the memory capacity. Therefore, the CPU needs to bear a large burden on the device side, and operations which use a large amount of memory must be reduced as much as possible.
In order to solve this problem, for example, in the case of the time setting described earlier, instead of recognizing the sound and responding with a recognition result when a user speaks one word, it is conceivable to have the user speak words that form one group, such as xe2x80x9ca.m.xe2x80x9d, xe2x80x9c6xe2x80x9d, and xe2x80x9c30xe2x80x9d, intermittently while leaving a small interval after every word, as the necessary content to set the time, and to perform sound recognition with respect to this spoken content. In this case, because there is no word-for-word response of the recognition result described earlier from the device for each of a plurality of words forming one group, it is possible to shorten the time setting period.
However, in a method in which a relatively long series of sounds forming a plurality of words is input to the device from beginning to end, as described earlier, the user may have a concern as to whether a sound per word spoken by himself/herself has been input in an appropriate state. Therefore, it is becoming necessary to inform the user of whether the sound spoken by the user was input in an appropriate state, without having troublesome processing.
Therefore, one aspect of this invention is to improve the convenience of a device during a sound inputting operation, and to inform a user whether the sound is appropriately input by performing a simple process when a sound is recognized with respect to a sound spoken by the user.
In order to accomplish this aspect, the method of notifying of an input state of sound to be recognized includes detecting an effective sound division for a sound to be recognized based upon the sound power which is obtained from a sound wave form of a sound to be recognized that is spoken by a speaker, determining whether the sound to be recognized has been input in an appropriate state, depending upon a time length of the effective sound division and magnitude of sound power within the effective sound division, and generating information showing that the sound is appropriate immediately after the completion of inputting of the sound to be recognized when it is determined that the sound is appropriate.
Furthermore, the sound to be recognized, for which it has been determined whether the sound has been input in an appropriate state, may be sound spoken with a plurality of words as one group, and may be spoken having a space, between each sound for each word forming this one group, as divisions for each word.
The information which is generated when the sound to be recognized, for which it is determined whether the sound has been input in an appropriate state, is determined to be appropriate, may be at least one of a sound signal, light, a sound message, and a display on a display screen which is instantly output in the spaces that form divisions for each of the words that form the one group.
The plurality of words may include one group belonging to a first through an nth (n is a positive integer) word group, the order of being spoken being determined from a word which belongs to the first word group to a word which belongs to the nth word group, and a reference which determines the time length of the effective sound division being set for each word group.
Additionally, the sound recognition device may also include a sound inputting device which inputs sound to be recognized spoken by a speaker and outputs the sound as digitized sound data; a sound analysis device which analyzes the sound data which has been output from the sound inputting device per predetermined time interval and calculates sound power and characteristic data per predetermined time interval; a sound division detection/determination device which detects effective sound division for the sound to be recognized, based upon the sound power which has been obtained by the sound analysis device, determines whether the sound to be recognized has been in an appropriate state, based upon the time length of the effective sound division and the magnitude of sound power within the effective sound division, and outputs a signal which shows that the sound is appropriate immediately after the completion of the input of the sound to be recognized when it is determined that the sound is appropriate; and a sound recognition processing device which recognizes and processes the sound to be recognized, based upon the characteristic data which has been obtained by the sound analysis device and the effective sound division for the recognition object sound which has been obtained by the sound division detection/determination device, and an information outputting device which outputs a sound message for a user from the device, and a response from the device for the recognition result, and also outputs information which shows that the sound to be recognized and is appropriate when the signal which shows that the sound to be recognized is appropriate is received from the sound division detection/determination device.
The invention may also include a recording medium on which is recorded a program that notifies of an input state of sound to be recognized, which when sound to be recognized is input from a speaker, may determine whether the sound has been input in an appropriate state, and may notify the speaker of the determination result. The processing program may include a procedure which inputs the sound to be recognized spoken by the speaker and outputs the sound as digitized sound data; a procedure which analyzes the sound data which has been thus obtained per predetermined time interval and calculates the sound power per predetermined time interval; a procedure which detects the effective sound division for the sound to be recognized, based upon the sound power which has been thus obtained, determines whether the sound to be recognized has been input in an appropriate state, based upon the magnitude of sound power within the effective sound division and the time length of the effective sound division, and outputs a signal which shows that the sound is appropriate immediately after the completion of the input of the sound to be recognized when it is determined that the sound is appropriate; and a procedure which outputs information which shows that the sound to be recognized is appropriate when a signal is received which shows that the sound to be recognized is appropriate.
This invention relates to improving the convenience of the device by informing the speaker of whether a sound to be recognized which has been input by the user is input in an appropriate state by performing simple processing. In order to realize this, based upon a time length of an effective sound division in the sound to be recognized spoken by the user and the magnitude of sound power of the effective sound division, it is determined whether the sound to be recognized has been input in an appropriate state. If it is determined that the sound has been appropriately input, information which shows that the sound has been appropriately input is given immediately after the input of the sound to be recognized. By so doing, the user can easily find out whether the sound spoken by himself/herself is input in an appropriate state. When the input operation of the sound is performed, the user will not have a concern as to whether the sound spoken by himself/herself has been input in an appropriate state.
Furthermore, the sound to be recognized, for which it is determined whether the sound has been input in an appropriate state, is a sound of a plurality of words spoken as one group, and is sound spoken having a space, between each sound division for each word forming this one group, as divisions for each word. For example, in the case of a clock in which time setting such as the current time can be performed by sound, a plurality of words such as xe2x80x9ca.m.xe2x80x9d, xe2x80x9co""clockxe2x80x9d, and xe2x80x9cminutexe2x80x9d can form as one group, and sound which is intermittently spoken by the user at an interval while leaving spaces between each word of the group as divisions is used.
When a plurality of words is thus considered as a group, in a state in which the sound spoken by the user is one-sidedly input, without the response of the recognition result from the device between each word, the user might have a concern as to whether each word has been input in an appropriate state for recognizing the sound.
In order to solve this problem, when a user speaks a sound as a plurality of words which can be considered as a group, the user feels relieved to find out the status of his/her sound if the device provides an indication between each word. For example, a sound signal may be instantly emitted within the division time of each word (for example, a sound signal such as xe2x80x9cbeepxe2x80x9d), a light may be instantly emitted by a light emitting diode (LED) or the like, a sound message (for example, an extremely short sound message such as xe2x80x9cyesxe2x80x9d), or in a device provided with a display part such as a liquid crystal display (LCD) or the like, a brief display such as xe2x80x9cO.K.xe2x80x9d on the LCD, are possible. By having this type of brief information instantly generated from the device, after the sound of each word spoken by himself/herself, the user can find out if the sound spoken by himself/herself has been input in an appropriate state. Therefore, the user feels confident with respect to the sound inputting operation.
Furthermore, the plurality of words forming one group belong to a first through an nth (n is a positive integer) word group. The order of the words spoken by the user is determined such that a word which belongs to a first word group is first, and then a word which belongs to a second word group is second. A reference which determines a time length of the effective sound division described earlier is set, based upon the respective word groups. This is because the length of the word (the time length needed for the user to speak) which belongs to the respective word groups may depend upon the word group. Therefore, by setting a reference to determine a time length of the effective sound division for the respective word groups, it is possible to determine a length of an appropriate effective sound division with respect to words which belong to the respective word groups.
In addition, because the sound recognition device of this invention uses the method of notifying a user of an input state of sound to be recognized explained above, the user can easily use the device and it is possible for a user who is not familiar with this type of device to easily use the device.
According to this invention explained above, it is determined whether the sound to be recognized has been input in an appropriate state, based upon the magnitude of sound power within the effective sound division and the time length of the effective sound division in the sound to be recognized spoken by the speaker. When it is determined that the sound is appropriate, information which shows that the sound is appropriate is emitted immediately after the input of the sound to be recognized. By so doing, when the user performs a sound inputting operation to the device, he/she will not have worries as to whether the sound spoken by the user has been input in an appropriate state, and an improvement of the operation when the sound input is performed can be developed.
In particular, this invention can obtain an effect when the sound is a plurality of words spoken as one group, and this type of sound is input. For example, in the case of a clock in which the time setting, such as of a current time, can be set by sound, a plurality of words such as xe2x80x9ca.m.xe2x80x9d, xe2x80x9co""clockxe2x80x9d, and xe2x80x9cminutesxe2x80x9d can be considered as one group and the user intermittently speaks the sound while leaving a space between each word as a division. By so doing, when the respective words can be intermittently spoken as one group formed by a plurality of words, after each word is spoken, a sound signal which is instantly emitted from the device side is returned. Therefore, the user can instantly find out whether the sound spoken by himself/herself has been input in an appropriate state, and feels confident with respect to the sound inputting operation.
In addition, as information which shows that the sound spoken by the user has been input in an appropriate state, instant information is simply emitted, so the processing can be less burdensome compared to the direct response of the recognition result after recognizing the respective words, and the processing time can be significantly shortened.
Furthermore, a sound recognition device which uses the method of notifying the user of an input state of sound to be recognized can be conveniently used. Even a user who is not familiar with this type of device can easily use the device, and the processing as a whole can be less burdensome, so a lower cost can be expected for the CPU and memory and the overall price of the device can be reduced.