The acoustic echo present in the microphone signal by virtue of parasitic coupling between the loudspeaker and the microphone of a computer terminal, a terminal such as a personal computer, or PC, a workstation or any other machine, is the major obstacle to the proper hands-free operation of software for voice communication between users. This acoustic echo results from the signal transmitted by the loudspeaker and picked up, at least partially, by the microphone on account of this parasitic coupling.
The echo cancellation algorithms are tailored to such an application context, and hands-free communication between videoconferencing terminals presents an undeniable attraction, despite the delay inherent in this mode of communication.
With this aim, the solutions currently proposed consist essentially of a system external to the terminal, or to the host machine, termed “add on audio”.
The implementation of such systems nevertheless constitutes a brake on the dissemination of communication products on account, on the one hand, of the additional costs generated and, on the other hand, of the difficulties of installation.
One possibility, for removing an encumbrance such as this, can consist in incorporating such an echo cancellation function into real time processing, by using software, in the host machine, as a task in its own right in the same way as the necessary specific tasks, such as sound coding/decoding task, image processing, interfacing with the network.
A priori, the operation of porting echo cancelling software onto a host machine, such as a PC computer, does not in itself constitute an obstacle, the processor of such machines being programmable in a high-level language and, generally, endowed with computational power which is compatible with the intended application, at least on machines of recent generation.
However, such an operation is confronted with the problem of synchronizing the audio data streams, incoming streams and outgoing streams, for the implementation of the echo cancellation function. These streams are generated by the sound card of the host machine.
When a task uses just an input stream and an output stream, and also in the case of the coding/decoding of speech, image processing and network interfacing, the synchronization process is relatively simple insofar as the end of the filling of an input buffer, or buffer memory, triggers the execution of the relevant task, then the filling of an output buffer when execution of the task has ended.
On the contrary, in the case of echo cancellation between loudspeaker and microphone of a computer terminal, in which case, as represented in FIG. 1, an echo canceller element AEC constituted by an adaptive filter making it possible to reinject for subtraction on the microphone signal a fraction of the loudspeaker signal, the aforesaid echo canceller requires the existence of two input streams, signal originating from the microphone termed the microphone signal smic, and signal originating from the remote party, bound for the loudspeaker and therefore termed the loudspeaker signal shp.
The echo cancellation process is based on estimating the impulse response of the loudspeaker/microphone parasitic coupling. The echo canceller AEC generates one or two output streams comprising at least the reinjected fraction of the loudspeaker signal.
Such a modus operandi therefore makes it necessary to wait for the end of the filling of two input buffers so as to proceed with the instigation of the relevant echo cancellation task. Furthermore, and vitally, the two incoming data streams must be perfectly synchronous, so as to allow correct estimation of the acoustic transfer function of the parasitic coupling, obtained from the impulse response.
When, in the case of the prior art, the echo cancellation function is carried out by way of a DSP card, card furnished with a dedicated signal processor, or where appropriate by way of an additional audio element or “add on audio”, the aforesaid vital condition of perfect synchronism is satisfied, since just a single clock, that of the DSP card or “add on” is used to drive, at the same time, the analog/digital converter operating on the microphone signal, the transferring of the digital data to the signal processor, the synchronization, by interrupt for example, of the computational program in the dedicated signal processor and to supply the computed samples to the digital/analog converter intended for the loudspeaker. Furthermore, the echo cancellation task and the corresponding computational operations are the only ones carried out by the signal processor when the latter is a dedicated signal processor, or at the very least are carried out in an ultra-priority manner.
The porting of the aforesaid task to a host machine nevertheless comes up against the major technical difficulties hereinbelow.
The management of the audio data streams is performed, in such a case, on the basis of software layers, such as layers managed by the operating system or the APIs, standing for Application Program Interfaces, when the WINDOWS® operating system is used. These software layers mask the real time constraints related to the audio signals acquisition/playback processes. However, to obtain maximum portability of the software, it is not conceivable to use software layers whose object code is specially tailored to the structure of the host machine, since it would then be necessary to rewrite them and this would, moreover, practically require one software version per type of machine.
Despite the allocating of a high priority to the aforesaid echo cancellation task, the system tasks, required for the implementation of the operating system, may however interrupt the running of the echo cancellation processing program and may, consequently, block, one, the other or both of the audio data streams, and thus cause a discontinuity in the speech signals acquisition/playback process.
Thus, by way of illustration, it is recalled that, during the transmission of a sound on the basis of speech samples, an initialization of the sound card is firstly undertaken, by designating the sampling frequency, the size and the number of buffers or buffer memories used to cater for the transfer of data. A first phase then consists in filling all the designated buffers and in validating their read output. When this first phase is completed, the operating system places itself on standby waiting for the buffers played or read. Specifically, the “thread” or processing task in a multitask operating system is activated on indication of the sound card only if at least one of the buffers has been read.
A similar manner of operation also governs the write acquisition of samples originating from the input of the microphone of the sound card. Under these conditions, the number of buffers which can be used by the APIs, standing for Application Program Interfaces, of the sound card is likewise designated. The latter returns an indication to the operating system making it possible to identify the buffer which has just been write filled.
To circumvent the problems of fine management of the operations for writing/reading the buffers, it is preferable to designate a considerable number of buffers of large size.
Unfortunately, such a choice leads to a considerable delay, possibly of up to a few seconds, in the audio chain. Although such a delay engenders no major consequence in the case of current applications, such as games, using only the sound output on PC personal computers, this delay proves to be, on the contrary, catastrophic for a bidirectional communication system or application.
Specifically, in an audio full duplex communication application, any delay introduced into one or the other of the communications has a particularly harmful influence on the naturalness and fluency of the conversation.
For this reason, it is necessary to work with the smallest possible size and number of buffers, in order to discretize and reduce each delay time introduced. However, the management and interruption of the sound acquisition/playback tasks are rendered particularly critical by this constraint.
Furthermore, the halting of the aforesaid acquisition and playback software tasks translates into a decrease or an increase in the delay between the mike signal and the loudspeaker signal, this delay thus being rendered variable. Specifically, the sound card not being completely tied to the operating system, the degree of autonomy of the sound card allowing a certain lightening of the burden and of the exploitation time of the central processor and of the operating system, a time shift occurs between the hardware modules of the card, on account of their partial autonomy, and the software modules being sensitive to the various demands of the operating system.
Although the human ear is only averagely sensitive, or is sensitive to these delays only onward of a certain value of the stream interrupt times, the echo cancellation processes and the echo cancellation systems implementing the latter lose their time reference completely. Consequently, a discontinuity in the audio streams translates into a time shift in the impulse response of the loudspeaker/microphone parasitic coupling, estimated by the echo canceller. In the worst case, the time window for estimating the impulse response of the parasitic coupling between the loudspeaker and the microphone being limited by the number of coefficients of the echo canceller, constituted by an adaptable digital filter, the temporal discontinuity in sound acquisition/playback may lead to the situation in which the aforesaid parasitic physical coupling, which cannot be made evident other than during the existence of the loudspeaker signal, appears outside this estimation window. The echo canceller then no longer produces any effect.
Thus, the problems to be solved in respect of a software installation, allowing real time processing, of the echo cancellation functions on a computer terminal, consist, on the one hand, in that the initial delay between the loudspeaker signal and the microphone signal is variable from one computer terminal to another, and in that, on the other hand, this delay varies over time, either as a result of actions controlled by the operating system of the terminal, or as a result of a phenomenon of drifting of the clocks with which the terminal is equipped.