Existing communication networks provide an interconnection of various technologies ranging from landline and cellular devices to IP-based phones or voice services running on computers. From a quality standpoint, maintaining the speech and audio communication performance solely within the network is a very complex and difficult task requiring a huge amount of resources. A general solution is just not feasible due to the sheer number of communication devices with different acoustical characteristics, and which are used in different circumstances. At best, networks attempt to reduce the effects of noise and echo produced by the equipment where various communication technologies interface in the network. For example, line echo cancellers placed before the 4-to-2 converters, known as hybrids, in land-line networks remove the echo caused by the impedance mismatch at these devices.
Therefore, most, if not all communication devices contain a considerable amount of hardware and signal processing to remove unwanted noise and acoustic echoes. Unwanted noise usually originates from the environment in which the device is being used such as a busy street or under windy conditions and is captured by the device's microphone. Acoustic echoes are produced by the coupling between the device's loudspeaker which plays the far-end signal and the microphone. Depending on the design and parts used in the communication device, this coupling can range from mild and linear to strong and highly nonlinear requiring different forms of processing, which is left up to the device manufacturer in order to deliver a standard of performance acceptable to the user of the device. An acoustic echo canceller which is usually implemented using a low-cost linear adaptive filter attempts to model the coupling between device loudspeaker and microphone and remove the echo. Acoustic echo cancellers usually perform satisfactorily under linear echo conditions while degrading in performance in the presence of nonlinearities.
Cellular modules (e.g. GSM, CDMA) which are built into user devices contain both echo and noise suppression functionality, where the echo suppression consists of an echo canceller and a post-processor characterized by a frequency-dependent gain function. It is also common for noise suppression functionality to be handled by this post processor in combination with the echo suppression. A block diagram of an exemplary speech enhancement system 2 that encompasses many of the systems available today is shown in FIG. 1. The system 2 outputs processed audio signals from a far-end user via a speaker 4 and receives audio signals from the near-end user via a microphone 6. As is known, the audio signals output by the speaker 4 will be picked up by the microphone 6 and will produce an echo for the far-end user.
Before digital to analog conversion and amplification for reproduction by the speaker 4, the digital far-end signal serves as an input to an acoustic echo canceller, AEC 8, which uses an adaptive filtering algorithm. The most commonly used algorithm in mobile devices is the Normalized Least Mean Squares (NLMS) algorithm due to its simplicity and low cost. The goal of this algorithm is to estimate the acoustic echo path between the loudspeaker 4 and microphone 6 and to create a replica of the echo signal on the microphone 6 in order to remove it from the microphone signal, resulting in an echo-free residual signal that ideally only contains the near-end user's speech.
Due to a number of factors including under-modeling and the presence of nonlinearities in the acoustic echo path (which can be produced by any or all of the amplifier (not shown in FIG. 1), the speaker 4, and mechanical housing of the device 2), a portion of the echo will inevitably remain in the residual signal that a tuneable post-processor 10 has to remove. The post-processing block 10 usually operates in the frequency domain and offers a trade-off between the amount of echo/noise suppression and the amount of speech distortion. Depending on the acoustics of the device, and consequent severity of residual echo contribution, the resulting two-way audio communication can be characterized as full-duplex or half-duplex. In half-duplex communication, the send (microphone) path is muted by the post-processor 10 when the far-end (loudspeaker) signal is active, while in full duplex communication, both near-end and far-end speakers can interrupt each other. Sometimes, to achieve full-duplex communication, some residual echo remains on the send path, i.e. the post-processor 10 does not distort the near end speech at the cost of not fully suppressing the residual echo component. In half-duplex mode, the post-processor 10 acts as a basic muting switch.
Single-microphone noise suppression algorithms commonly make use of statistics-based noise estimation methods to derive a corresponding gain function. However, for non-stationary noise signals which vary with time, properly tracking and estimating the noise statistics becomes difficult using such an approach. Therefore, many systems also apply over-subtraction, which consequently leads to undesirable distortion of the resulting speech signal.
In addition to signal enhancement systems that aim to suppress unwanted echoes and noise from the microphone signal, algorithms also exist which adjust or equalize the loudspeaker signal depending on the noise in the environment. Also known as speech reinforcement algorithms, these increase the loudness of the loudspeaker signal so that it is not masked out by the surrounding environmental noise. Speech reinforcement systems also exist that are tailored to the needs of the elderly. For example, in US 2006/0088154, a system is presented that analyzes a user's speech to determine if he or she is an elderly person and adjusts the loudspeaker signal accordingly. Most of these algorithms analyze the surrounding noise present on the microphone 6 and apply frequency-dependent gain values to the loudspeaker signal before reproduction.
FIG. 2 is a block diagram of a cellular system 20 that consists of two mobile terminals (MT-A and MT-B) representing the users' mobile devices and a mobile switching centre (MSC) or base station that establishes and manages the connection between the two devices.
It has been proposed in WO 98/43368 to move acoustic echo cancellers (AEC) from the mobile devices to the MSCs in order to remove the power burden on the MTs. In FIG. 2 this is represented by the audio block 21. Placing the echo cancellation functionality at the MSC, however, comes at a cost of poorer echo cancellation performance with standard echo cancellation/suppression algorithms due to the presence of speech encoders/decoders (22, 24, 26, 28) which introduce nonlinearities into the echo path. Therefore, more complex algorithms would be required. Furthermore, the acoustical characteristics of devices connected via the MSC are not known in advance. The advantage of having AECs in the MT is that the coupling between loudspeaker 30 and microphone 32 can usually be modeled using a linear finite impulse response filter (FIR) and the processing in the MT is tuned to its specific acoustics. With the presence of nonlinear processing introduced by the encoders, the problem of acoustic echo cancellation becomes more challenging.
FIG. 2 does not illustrate the functionality provided by the MSC. Depending on the types of coders available in the MTs, the MSC has the task of translating the encoding scheme between MTs. It performs this tandeming or transcoding by decoding the MT signal and then re-encoding the signal for proper reception by the second MT. For example, if mobile terminal MT-A uses encoder 22 Enc-A and mobile terminal MT-B uses encoder 28 Enc-B, then the MSC first decodes the incoming signal from MT-A using decoder Dec-A and then re-encodes this signal using encoder Enc-B for transmission to MT-B.
Many elderly people now carry personal help buttons (PHBs) or personal emergency response systems (PERS) that they can activate if they need urgent assistance, such as when they fall. Automated fall detectors are also available that monitor the movements of the user and automatically trigger an alarm if a fall is detected.
These devices (i.e. PHBs, PERS and fall detectors) can initiate a landline call via a base unit located nearby to the user (i.e. typically in the user's home) to a dedicated call centre when they are activated, and the personnel in the call centre can talk to the user and arrange for assistance to be sent to the user in an emergency. As the user is a registered subscriber to the PHB/PERS service, their home location (or other location where the base station is found) will be known, and the emergency assistance can be directed to that location by the call centre personnel.
However, systems are now available that make use of a mobile telephone or other mobile telecommunications-enabled device carried by the user to allow the PHB, PERS or fall detector device to initiate a call over a mobile telecommunications network to the call centre. These devices are sometimes referred to as mobile PERS (MPERS) devices and can be used anywhere where there is cellular network coverage. As the typical users of these MPERS devices are elderly or those with some form of physical or mental impairment, it is important for the devices to be as simple to operate as possible. As a result, mobile telecommunications functionality is preferably integrated into a dedicated PHB or PERS pendant that is worn by the user and that typically only has a single activation button or a very small number of manual controls. On activation of the MPERS device, a call is automatically placed to the call centre number preset in the device.
Given the nature of the MPERS devices discussed above, it is desirable to minimize their power consumption in order to maximize the battery life and reduce the frequency with which the user has to recharge or replace the batteries.
In addition, as a typical user of these devices may have hearing difficulties, it is desirable to ensure that audio output by the MPERS device is as clear and audible as possible for the user.