When a user operates a computer, noises are often generated. For example, when a key on a computer keyboard is pressed there is a short mechanical sound (i.e. a clicking sound). Similarly, when the buttons on a mouse are pressed a clicking sound is produced.
A microphone of a computer can be used to receive audio signals, such as speech from a user. The user may enter into a call with another user, such as a private call (with just two users in the call) or a conference call (with more than two users in the call). The user's speech is received at the microphone and is then transmitted over a network to the other user(s) in the call. The audio signals received at the microphone will typically include speech components from the user and also noise from the surrounding environment. In order to improve the quality of the signal, such as for use in the call, it is desirable to suppress the noise in the signal relative to the speech components in the signal. When a user is operating a peripheral device of a computer at the same time as partaking in the call, the noise in the audio signal might include the noise generated by the user's operation of the peripheral device. For example, clicking noise such as the sound from a key stroke on a keyboard might be picked up by the microphone and included in the signal that is sent to the other participants in the call. The noise (e.g. clicking noise) can be annoying to the other participants in the call and can interfere with their experience of the call.
One approach for suppressing noise in an audio signal is to use background noise reduction methods. Background noise reduction methods analyse the audio signal in a time and/or frequency domain during periods of speech inactivity (i.e. when the user is not speaking). The background noise reduction methods identify signal components that reduce the perceived quality of speech and attenuate those identified components. Background noise algorithms which can be used in the background noise reduction methods are usually successful in removing stationary noise (e.g. noise comprising a periodic signal and its potential harmonics) from the audio signal. Stationary noise comprises noise components for which the statistical distribution functions do not vary over time. However, background noise algorithms have difficulty in identifying and attenuating transient and non-stationary components of noise, such as clicking noise generated for example from keyboard activity. Clicking noise is a good example of non-stationary noise in that clicking noise fluctuates in time, and any clicking noise generated by a user (such as by typing on a keyboard) is likely to be treated by the background noise algorithm as if it were a speech signal, and therefore would not be attenuated.
Another approach for suppressing noise from an audio signal is to use specific noise attenuation algorithms for respective specific types of noise, such as keyboard noise attenuation algorithms for attenuating keyboard noise. Keyboard noise attenuation algorithms typically analyse the audio signal received at a microphone to detect and filter out components of the audio signal that are identified as keyboard clicking noise. In this sense, keyboard noise attenuation algorithms comprise two major steps. The first step is detection of the clicking noise in the audio signal and the second step is attenuation of the clicking noise. The detection step can be problematic when the user is engaged in a call because some types of noise such as clicking noise (e.g. keyboard tapping noise) have similar initial characteristics to those of speech, in particular to those of the onset of speech. It is therefore difficult to detect these types of noise in a reliable way and to differentiate between speech and these types of noise without adding a delay and looking for a full click. In the second step of attenuating the noise it is preferred to remove only those components of the signal coming from the noise generating activity (e.g. the keystrokes on the keyboard) while not modifying other components in the audio signal. In particular it is preferable not to modify the speech components of the audio signal when attenuating the clicking noise from the audio signal. However, as described above it can be difficult to detect the difference between some types of noise (such as clicking noise) and the onset of speech, and therefore it is problematic to attenuate those types of noise without distorting the speech components of the audio signal. This problem is compounded by the fact that the onset of speech signals are crucial for the intelligibility of the speech, so any attenuation of the onset of speech can seriously affect the intelligibility of the speech in the audio signal.
Existing clicking noise attenuation algorithms can be split into two groups. The first group of clicking noise attenuation algorithms are effective in attenuating clicking noise from the audio signal without distorting the speech components of the audio signal to an extent that would be unacceptable to a user. However, the first group of clicking noise attenuation algorithms require data from the future audio signal, such that a delay somewhere around 100 ms is added which makes the use of the clicking noise attenuation algorithms of the first group impractical for use in real time communications, such as a voice call. Any delays added to the audio signal will have a detrimental effect on the user's perception of the quality of a call or other real time communication. The second group of clicking noise attenuation algorithms do not add a significant delay to the processing of the audio signal, such that clicking noise attenuation algorithms of the second group are suitable for use in real time communications, such as a voice call. However, the algorithms of the second group are not as effective at attenuating clicking noise from the audio signal as are the algorithms of the first group. The algorithms of the second group have a tendency to distort the speech components of the audio signal because they will occasionally mistake speech onsets for a click, such as a tap on the keyboard.
There is therefore a problem of reliably suppressing noise generated by user activities such as keyboard clicking from an audio signal for use in a real time communication event, without significantly distorting speech components in the audio signal.