The present invention relates to a self-service terminal (SST). In particular, the invention relates to an SST having an acoustic interface for receiving and/or transmitting acoustic information, such as a voice-controlled ATM.
Voice-controlled ATMs allow a user to conduct a transaction by speaking and listening to an ATM; thereby obviating the need for a conventional monitor. In some voice-controlled ATMs a biometrics identifier, such as a human iris recognition unit, is used to avoid the user having to insert a card into the ATM. When a biometrics identification unit is used, there is no requirement for a conventional keypad.
Voice-controlled ATMs make the human to machine interaction at an ATM more like a human to human interaction, thereby improving usability of the ATM. Voice-controlled ATMs also improve access to ATMs for people having certain disabilities, such as visually-impaired people.
Although voice-controlled ATMs have a number of advantages compared with conventional ATMs, they also have some disadvantages. These disadvantages mainly relate to privacy and usability.
Some disadvantages relate to the ATM speaking to the user. For example, if an ATM that is located in a public area audibly confirms withdrawal of one hundred pounds, then the user may feel vulnerable to attack and may believe that there is a lack of privacy for the transaction, as passers-by may overhear the ATM confirming the large amount of cash to be withdrawn.
Other disadvantages relate to the user speaking to the ATM. For example, in noisy environments such as a busy street or a shopping center, the ATM may not be able to discriminate between the user""s voice and background noise. The user may become frustrated by the ATMs failure to understand a command being spoken by the user; this may lead to the user shouting at the ATM, which further reduces the privacy of the transaction.
It is an object of an embodiment of the present invention to obviate or mitigate one or more of the above disadvantages or other disadvantages associated with SSTs having acoustic interfaces.
According to a first aspect of the present invention there is provided a self-service terminal having an acoustic interface characterized in that the terminal comprises a user locating mechanism, a controller, and an array of individually controllable acoustic elements; whereby, in use, the locating mechanism is operable to locate a user and to convey user location information to the controller, and the controller is operable to focus each acoustic element to the user""s location.
It will be appreciated that the acoustic elements may be microphone or loudspeaker elements. When the acoustic elements are loudspeakers, the controller is operable to control the loudspeakers so that sound from the loudspeakers is only audible in the area in the immediate vicinity of the user. This ensures that the privacy of the user is increased. When the acoustic elements are microphones, the controller is operable to control the microphones so that only sound from the area in the immediate vicinity of the user is conveyed, thereby removing the effect of background noise. The microphone elements may detect all sound indiscriminately and the controller may operate on all the sound to mask out sound from areas other than the vicinity of the user. Alternatively, the microphone elements may only detect the sound from the vicinity of the user.
The term xe2x80x9cfocusxe2x80x9d as used herein denotes directing the acoustic elements to a relatively small area or zone. Where the elements are microphones, when the microphones are focused audible signals are only conveyed from this zone, even if the microphones detect sound from areas outside this zone. Where the elements are loudspeakers, when the loudspeakers are focused they transmit audible signals to only this zone.
The zone may be defined by a certain angular beam width, for example, if a linear array is used and the array can focus anywhere between the angles of xe2x88x9245 degrees and +45 degrees relative to a line normal to the array, then the elements may be able to focus to a zone of five degrees, such as xe2x88x9220 to xe2x88x9215 degrees. The zone may be defined by an angular beam width and a distance, for example two meters from the array and at an angular beam width of xe2x88x9215 to xe2x88x9220 degrees.
Preferably, the locating mechanism uses visual detection to locate the user and to output user location information to the controller in real time. For example, the visual detection may be a stereo imager. One advantage of using a visual detection mechanism is that the user will be located accurately even though the background noise is louder than the user""s voice; whereas, if an audio detection mechanism is used then the background noise may be targeted because it is the loudest noise being detected.
Another advantage of using a visual detection system is that the acoustic elements can be focused on the user prior to the user speaking to the SST, this ensures that all of the user""s speech will be detected by the SST; whereas, if an audio detection mechanism is used, the user cannot be targeted until he/she speaks to the SST, so the first few words spoken by a user may not be detected very clearly.
Yet another advantage of using a visual detection system is that the visual system can continue detecting the user""s position during a transaction, so that if the user moves then the acoustic elements can be re-focused to the user""s new position.
In one embodiment where an SST includes an iris recognition unit, the stereo cameras that are used to locate the user""s head may be modified to output a value indicative of the position of the user""s head. This value may relate to the angular position of the user""s head relative to a line normal to the array of elements. Some additional processing may be performed to locate the user""s mouth and ears, as iris recognition units generally detect the location of a user""s eye.
In less preferred embodiments, the locating mechanism may use an audio mechanism, such as acoustic talker direction finding (ATDF), for locating the position of a user.
Preferably, the array is a linear array. In more complex embodiments, the array may be a planar array for focusing a beam in two dimensions rather than one dimension.
In one embodiment the array may be an array of ultrasonic emitters or transducers that are powered by an ultrasonic amplifier, under control of an ultrasonic signal processor, to produce a narrow beam of sound.
The controller may control an array of microphones and an array of loudspeakers. The two arrays may be integrated into the same unit.
Preferably, the controller controls the array using a spatial filter to operate on the acoustic elements in the array. One suitable type of filter is based on the electronic beamforming technique, and is called xe2x80x9cFilter and Sum Beamformingxe2x80x9d. By using beamforming, the amplitude of a coherent wavefront can be enhanced relative to background noise and directional interference, thereby achieving a narrower response in a desired direction. In one implementation of a spatial filter, the controller includes a digital signal processor (DSP) and an associated memory, where the DSP applies a Finite Impulse Response filter to each element.
Alternatively, but less preferred, the controller may control the elements by adjusting the physical orientation of the elements.
Preferably, the memory is pre-programmed with a plurality of algorithms, one algorithm for each zone at which the elements can be focused. The algorithms comprise coefficients (which may include weighting and delaying values) for applying to each element.
Preferably, the DSP receives the user location information, accesses the memory to select an algorithm corresponding to the user location information, and applies the coefficients within the algorithm to the acoustic elements to focus the elements at the desired zone.
Preferably, each microphone element includes a transducer, a pre-amplifier, and an analog-to-digital (A/D) converter. Preferably, each loudspeaker element includes a power amplifier, a transducer, and a digital-to-analog converter (D/A).
By virtue of this aspect of the invention, the acoustic elements can be used to create a privacy zone around the user""s head so that only the user can hear an SST""s spoken commands, and the SST only listens to the user""s spoken commands; thereby improving privacy and usability for the user, and the speech recognition of the terminal.
According to a second aspect of the present invention there is provided a self-service terminal having an acoustic interface characterized in that the terminal comprises a lo directional acoustic element array capable of interacting with a user located anywhere in a broad zone, a steering mechanism operable to direct the array to a narrow zone within the broad zone, and a locating mechanism operable to detect the location of a user within the broad zone and to inform the steering mechanism of the location of the user.
The broad zone may be at least five times the size of the narrow zone; preferably, the broad zone is at least ten times the size of the narrow zone; advantageously, the broad zone is at least sixteen times the size of the narrow zone. In one embodiment, the narrow zone is defined by an angular beam width of 5 degrees and the broad zone is defined by a beam angle of 90 degrees.
According to a third aspect of the invention there is provided a method of interacting with a user of an SST, characterized by the steps of detecting the location of the user and adjusting one or more acoustic element arrays to focus the arrays at the location of the user.
According to a fourth aspect of the invention there is provided a self-service terminal having an acoustic interface characterized in that the terminal comprises a user locating mechanism, a controller, and an array of individually controllable loudspeaker elements; whereby, in use, the locating mechanism is operable to locate a user and to convey user location information to the controller, and the controller is operable to direct first audio signals to the location of the user and second audio signals to other locations.
The first audio signals may relate to a transaction being conducted by the user. The second audio signals may be audio advertisements to passers-by or people waiting in a queue to use the SST. Alternatively, the second audio signals may be noise (such as white or pink noise) or warnings to increase the privacy of the user. Additional audio signals may also be used, so that the terminal may simultaneously transmit different audio signals to a user, to passers-by, to people queuing behind the user, and to people standing too close to the user.
The SST may include a proximity detector for detecting the presence or entrance of people within a zone around the user. On detecting a person within the zone around a user, the terminal may direct an audio signal to the person in the zone around the user.
By virtue of this aspect of the invention, a steerable loudspeaker array may be used to supply different audio information to a user of an SST than to those people who are in the vicinity of the SST, thereby creating an acoustic privacy shield for the user of the SST.