1. Field of the Invention
The present invention relates to an interactive apparatus.
2. Related Art
In recent years, an increasing number of everyday telephone interactions have been automated, thereby removing the need for a human operator to progress the interaction.
One of the first interactions to be automated was simply the leaving of a message for an intended recipient who was not present to take the call. Recently, more complex services such as telephone banking, directory enquiries and dial-up rail timetable enquiries have also been automated. Many answerphones now additionally offer a facility enabling their owner to telephone them and hear messages which have been left. Another service which has now been automated is the reading of stored e-mail messages over the telephone.
In each of the above cases, a user, in effect carries out a spoken dialogue with an apparatus which includes an interactive apparatus, the telephone he or she is using and elements of the Public Switched Telephone Network.
In the spoken dialogue it is often useful if the user is able to interrupt. For example, a user might wish to interrupt if he or she is able to anticipate what information is being requested part way through a prompt. The facility enabling interruption (known as a xe2x80x9cbarge-inxe2x80x9d facility to those skilled in the art) is even more desirable in relation to message playback apparatuses (such as answerphones) where a user may wish to move onto another message without listening to intervening messages.
Providing a barge-in facility is made more difficult if some of the output from the interactive apparatus is fed-back to the input which receives the user""s commands. This feedback arises owing to, for example, junctions in the network where voice-representing signals transmitted from the interactive apparatus are reflected back to its input. It is also caused by the acoustic echo of the speech output from the speaker of the user""s telephone back to the microphone (this is especially problematic in relation to handsfree operation). There is therefore a need to distinguish fed-back output signals from the user""s input in order to provide a more reliable barge-in facility than has hitherto been possible.
According to the present invention there is provided an interactive apparatus comprising:
signal output means arranged in operation to output a signal representative of conditioned speech;
signal input means arranged in operation to receive a signal representative of a user""s spoken command;
wherein the conditioned speech lacks a component normally present in speech;
command detection means operable to detect a user""s command spoken during issuance of the conditioned speech by detecting the input of a signal which represents speech including the component lacking from the conditioned speech
The advantage of providing such an apparatus is that it is better able to detect the presence of a user""s commands. This is particularly useful in relation to an apparatus which uses a conventional speech recogniser, as the performance of such recognisers falls off sharply if the voice signal they are analysing is in any way corrupted. In an interactive apparatus distortion caused by an echo of the interactive apparatus""s output can cause the user""s command to be corrupted. The present invention alleviates this problem by enabling the apparatus to stop outputting voice-representing signals or speech as soon as the user""s response is detected.
In some embodiments, the apparatus further comprises a means for conditioning signals representative of speech output by the interactive apparatus. Because the quality of recorded speech is better than the quality of speech synthesised by conventional synthesisers, many conventional interactive apparatuses use recorded speech for those parts of the dialogue which are frequently used. However, for apparatuses such as those which are required to output signals representing a spoken version of various telephone numbers or amounts of money it is currently impractical to record a spoken version of every possible output. Hence, such outputs are synthesised when required. A recorded speech signal can be pre-conditioned to lack the said component at the time that the speech signal is recorded. Hence, apparatuses whose entire output is recorded speech do not require a means for conditioning the signals representative of speech to be output by the interactive apparatus. Such apparatuses have the clear advantage of being less complex in their construction and are hence cheaper to manufacture.
Preferably, the said lacking component comprises one or more portions of the frequency spectrum. This has the advantage that the apparatus is easy to implement.
The apparatus is found to be most effective when the portion of the frequency spectrum lies in the range 1000 Hz to 1500 Hz.
Preferably, the width of the frequency band is in the range 80 Hz to 120 Hz. It is found that if the width of the frequency band is greater than 120 Hz then the output which the user hears is significantly corrupted, whereas if the width is less than 80 Hz the conditioning of the output of the interactive apparatus is made more difficult and it also becomes harder to discriminate between situations where the user is speaking and situations where he or she is not.
According to a second aspect of the present invention there is provided a method of detecting a user""s spoken command to an interactive apparatus, said method comprising the steps of:
outputting a signal representative of conditioned speech, wherein the conditioned speech lacks a component normally comprised in users"" spoken commands;
monitoring signals input to the interactive apparatus for the presence of signals representative of speech including said component; and
determining that the input signal represents the user""s spoken command on detecting the presence of signals representative of speech including said component.
According to a third aspect of the present invention there is provided a voice-controllable apparatus comprising:
an interactive apparatus according to the first aspect of the present invention;
means for converting said signal representative of conditioned speech to conditioned speech; and
means for converting a user""s spoken command to a signal representative thereof.
The problems addressed by the present invention also occur in relation to apparatuses which are directly voice-controlled (i.e. where there is no intermediate communications network). Embodiments of the third aspect of the present invention therefore include, amongst other things, domestic and work-related apparatuses such as personal computers, televisions, and video-recorders offering interactive voice control.