In communications functions, one of the difficulties is to ensure sufficient intelligibly of the signal picked up by the microphone, i.e. the signal representing the speech of the near speaker (the wearer of the headset).
The headset may be used in an environment that is noisy (subway, busy street, train, etc.), such that the microphone picks up not only speech from the wearer of the headset, but also interfering noises from the surroundings.
The wearer may be protected from these noises by the headset, particularly if it is of a kind comprising closed earpieces that isolate the ears from the outside, and even more so if the headset is provided with “active noise control”. In contrast, the remote listener (i.e. the party at the other end of the communication channel) will suffer from the interfering noises picked up by the microphone, which noises are superposed on and interfere with the speech signal from the near speaker (the wearer of the headset).
In particular, certain speech formants that are essential for understanding the voice are often buried in noise components that are commonly encountered in everyday environments, which components are for the most part concentrated at low frequencies.
In such a context, the general problem of the invention is to provide noise reduction that is effective, enabling a voice signal to be delivered to the remote speaker that is indeed representative of the speech uttered by the near speaker, which signal has had removed therefrom the interference components from external noises present in the environment of the near speaker.
An important aspect of this problem is the need to play back a speech signal that is natural and intelligible, i.e. that is not distorted and that has a frequency range that is not cut down by the denoising processing.
One of the ideas on which the invention is based consists in picking up certain voice vibrations by means of a physiological sensor applied against the cheek or the temple of the wearer of the headset, so as to access new information relating to speech content. This information is then used for denoising and also for various auxiliary functions that are explained below, in particular for calculating a cutoff frequency of a dynamic filter.
When a person is uttering a voiced sound (i.e. producing a speech component that is accompanied by vibration of the vocal cords), the vibration propagates from the vocal cords to the pharynx and to the mouth-and-nose cavity, where it is modulated, amplified, and articulated. The mouth, the soft palate, the pharynx, the sinuses, and the nasal cavity form a resonance box for the voiced sound, and since their walls are elastic, they vibrate in turn, and this vibration is transmitted by internal bone conduction and is perceptible from the cheek and from the temple.
By its very nature, such voice vibration from the cheek and from the temple presents the characteristic of being corrupted very little by noise from the surroundings: in the presence of external noise, the tissues of the cheek or of the temple vibrate very little, and this applies regardless of the spectral composition of the external noise.