The ubiquitous nature of high speed internet connections has made personal computers a popular basis for teleconferencing applications. While embedded microphones, loudspeakers, and webcams in laptop computers have made conference calls very easy to set up, these features have also introduced specific noise nuisances such as feedback, fan noise, and button-clicking noise. Button-clicking noise has been a particularly persistent problem, and is generally due to the mechanical impulses caused by keystrokes. In the context of laptop computers, button-clicking noise can be a significant nuisance due to the mechanical connection between the microphone within the laptop case and the keyboard.
The noise pulses produced by keystrokes can vary greatly with factors such as keystroke speed and length, microphone placement and response, laptop frame or base, keyboard or trackpad type, and even the surface on which the computer is placed. It is also noted that in many scenarios the microphone and the noise source might not even be mechanically linked, and in some cases the keyboard strokes could originate from an entirely different device, making any attempt at incorporating software cues futile.
There are a handful of approaches that attempt to address the problem described above. However, none of these proposed solutions attempt to tackle the issue in real-time, and none are based purely on the audio stream. For example, a first approach utilizes a linear predictive model on frequency bins in an area around the audio frame in question. While this first approach has the advantage of dealing with speech segments with sharp attacks, the required look-ahead is between 20-30 milliseconds (ms), which will delay any detection by at least this much. Such an approach has been suggested only as an aid where the final detection decision requires confirmation from the hardware keyboard.
It should be noted that with frame lengths of 20 ms and overlaps of 10 ms, the exact localization of the transient is lost. Exact localization of the transient is of interest when the transient is to be removed from the audio stream. It is also worth noting that many transient noises might not be detectable as a hardware input through the keyboard and a more general approach will provide a more consistent noise reduction performance on transient noise.
A second approach proposes relying on a median filter to identify outlying noise events and then restoring audio based on the median filter data. This second approach is primarily designed for much faster corruption events with only a few corrupted samples.
A third approach is similar to the second approach described above, but with wavelets used as the basis. While this third approach increases the temporal resolution of detection, the approach considers the scales independently, which might give rise to false detections based on the more transient voiced speech components.
A fourth approach to resolving the nuisance of button-clicking noise proposes an algorithm relying on no auxiliary data. In this fourth approach, detection is based on the Short Time Fourier Transform and detections are identified by spectral flatness and increasing rate of high-frequency components, which can falsely detect voiced segments with a sudden onset. The algorithm proposed in this fourth approach is meant for post-processing, and a computationally-efficient real-time implementation of this algorithm would lose temporal resolution. It is also not clear that this fourth approach would work well for the range of transient noise seen in real life applications. A probabilistic interpretation of the detection state could yield a more adaptable and dependable basis for detection. This fourth approach also proposes restoration based on scaled frequency components which, coupled with the low temporal resolution, could be overly invasive and unsettling to the listener.