The present invention concerns a device and a method for multichannel acoustic echo compensation with variable number of channels as they are used especially for acoustic human-machine interfaces with hands-free devices and multichannel output, in order to make multichannel full-duplex communication possible.
The basic problems of acoustic echo compensation are described in detail in the review article “Stereophonic Acoustic Echo Cancellation—An Overview of the Fundamental Problem”, IEEE Signal Processing Letters, Vol. 2, No. 8, August 1995, by M. Mohan Sondhi et al.
If only a single full-duplex audio channel is used for bi-directional speech transfer between a first as well as a second audio transmission and receiving unit in acoustic human-machine interfaces, for example, microphones, loudspeakers in video conference systems or telephone conference systems, then, an acoustic echo compensation can be performed by using adaptive filters in order to suppress undesirable echoes which arise from feedback between loudspeakers and microphones in the first and second audio transmission and receiving units.
In conventional single-channel acoustic echo compensators, the use of a single FIR (finite impulse response) filter with adaptive adjustable filter coefficients is sufficient to model the acoustic pulse response of the echo path. An estimated signal for the echo modeled by the adapted filter is then deducted from the actual echo signal to obtain an error signal, which is adjusted to the echo path which may possibly change in the course of time, by permanent adaptive continued regulation of the filter coefficients, so that the error signal is continuously kept as low as possible.
However, especially in video conference or telephone conference transmissions, it may be desirable, using of several acoustic transmission channels, each with at least one assigned loudspeaker, to transfer an acoustic pattern which is as true to the room as possible, from a first to a second audio transmission and receiving unit. For example, this is of interest, when several speakers are located in a first room, from whom the speech sound is to be transferred to a receiver in a second room. If one then uses two or more acoustic transmission channels to a second room, where a listener is located, then this listener receives a stereo or multichannel acoustic pattern from the first room, which makes it easier for him, for example, to assign the speech sound to the individual speakers.
As explained by the above review article, for example, also in “Stereo Projection Echo Canceller with True Echo Path Estimation”, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 95), Detroit, Mich., USA, PP. 3059–3062, May 1995, by S. Shimauchi et al. or “A better understanding and an improved solution to the problems of stereophonic acoustic echo cancellation”, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 97), Munich, pp. 303–306, April 1997, by J. Benesty et al., however, due to the mutual influence of the individual transmission channels among one another, in the case of stereo or multichannel compensation, a number of additional problems occur in comparison to the mono-channel situation, where an individual adaptive filter is sufficient for echo compensation.
Various solution sets for problems that occur in the multichannel case are especially explained in the article “Stereophonic Acoustic Echo Cancellation—An Overview and Recent Solutions”; Proc. 6th Int. Workshop on Acoustic Echo and Noise Control, Pocono Manor, Pa., USA pp. 12–19, September 1999, by S. Makino et al. Individually, the following are dealt with: addition of statistically independent noise signals to the loudspeaker signals, nonlinear signal processing; the use of decorrelation filters, the use of various time-variable filter techniques, and the use of special adaptive algorithms in the filters.
Especially in the multichannel case, according to our state of knowledge today, signal processing for partial (not detectable) decorrelation of the loudspeaker signals is necessary in order to make unequivocal convergence of adaptive filters to the true room pulse responses possible. As already stated, the basic idea of echo compensation is to simulate, using digital filter structures, the echo paths which arise from the interplay of certain loudspeaker characteristics, a certain room acoustics and a certain microphone characteristics.
This will be explained below in more detail with the aid of FIG. 3. In the case of the echo compensation device according to the state of the art shown there, the audio signals emitted by a multichannel audio signal processing unit 1, are sent through separate loudspeaker channels LK1, . . . , LKD to the corresponding loudspeakers L1, . . . , LD. A channel-specific pre-processing unit V1, . . . , VD is located in each section of the loudspeaker channels LK1, . . . , LKD. The audio signals running through the pre-processing units V1, . . . , VD can each be locked there individually in a channel-specific manner.
The loudspeakers L1, . . . , LD assigned individually to loudspeaker channels LK1, . . . , LKD emit acoustic signals corresponding to the received audio signals into the surrounding room.
Furthermore, a microphone M is provided which serves as input interface for acoustic signals, for example, speech sounds from a person speaking into the microphone.
The microphone M converts the received acoustic signals into microphone signals, which are sent back to the multichannel audio signal processing unit 1 through a microphone channel MK for further processing.
The acoustic signals radiated by loudspeakers L1, . . . , LD are superimposed depending on the structures in the room, in which the loudspeakers L1, . . . , LD are set up, and are also received by microphone M.
As a result of this, echo signals are produced, because the acoustic signals emitted by the loudspeakers L1, . . . , LD are received by the microphone M, from there are sent to the multichannel audio signal processing unit 1, from where, under certain circumstances, are sent again to loudspeakers L1, . . . , LD.
The basic idea of echo compensation is to compensate by digital filter structures the “echo paths” arising from the interaction of the acoustic signals emitted by the loudspeakers L1, . . . , LD and from their difference paths predetermined by the spatial propagation conditions to microphone M and by the microphone characteristics. This occurs by the fact that such digital filter structures produce estimate signals for the echo signals expected through the echo paths and that the estimate signals are subtracted from the microphone signals which contain the actual echo signals.
If there was exact agreement between the real room pulse responses and the pulse responses of the digital filter, the echo signals would be extinguished in the microphone signal.
However, since the echo paths generally have a very complex structure which is not known beforehand and which, in addition, can change in time, the echo paths must be continuously reidentified, that is, adaptively identified.
The adaptive filter 2 shown in FIG. 3 serves this purpose: the audio signals entered through channels LK1, . . . , LKD to loudspeakers L1, . . . , LD are introduced to this filter through branch lines A1, . . . , AD. In the adaptive filter 2 the audio signals introduced through branch lines A1, . . . , AD are superimposed on weighting coefficients (filter coefficients) to be optimized, according to specified adaptation algorithms. The adaptive adjustment is based on mathematical models which provide adjustment of the temporarily valid filter coefficients to the temporarily valid echo path conditions.
In order to make unequivocal convergence of the filter coefficients to the true room pulse responses possible in the multichannel case, the signal pre-processing, which is necessary according to our present-day knowledge (see, for example, the article by J. Benesty et al. mentioned above) for partial (acoustically not detectable) decorrelation of the loudspeaker signals, is carried out in the preprocessing units V1, . . . , VD shown in FIG. 1.
However, it can be shown theoretically and experimentally that, in spite of this preprocessing, the expenditure for echo compensation generally increases with increasing number of channels and the convergence behavior of the individual channel signals to be superimposed in the adaptive filter becomes worse. If D different preprocessing units are used then this leads to very slow convergence of the filter coefficients when the actual number of channels C of the audio signal is smaller than the actual number of channels D, that is, when C<D. This case is typical for the use in multimedia terminal equipment (for example, when a multimedia terminal equipment is used as stereo television unit, with which a broadcast is considered in which the tone is displayed only with one mono-channel.
The performance of multichannel echo compensation for acoustic interfaces in multimedia terminals is a relatively new application. Conventional attachments for telephone conference applications provide a fixed channel number, D, for the audio signals.
The relatively slow convergence behavior arises in this case by insufficient decorrelation of originally exactly the same audio signals which are passed through separate audio channels.
The solution set known from the article by J. Benesty et al. cited above as state of the art provides D equal nonlinear preprocessing units, as a result of which the above problem is lessened. In any case, in this way the decorrelation possibilities are also limited, especially when the signals of the individual channels differ mainly in their levels (for example, in case of intensity stereophony).