The present invention relates to a signal source localization arrangement comprising a plurality of receivers having different positions, the signal source localization arrangement comprising delay estimation means for estimating a delay difference between the signals received by at least two receivers, and position determining means for determining from the delay difference a signal source location.
The present invention relates also to a delay estimation arrangement, a video communication system and a signal source localization method.
An arrangement according to the preamble is known from the article xe2x80x9cVoice source localization for automatic camera pointing system in videoconferencingxe2x80x9d by Hong Wang and Peter Chu in IEEE, ASSP workshop on applications of signal processing to audio and acoustics, 1997.
Signal localization arrangements are used in several applications. A first example of such applications is automatic camera pointing in video conferencing systems or in security systems. Another application is the determination of the position of a user of an audio system, in order to be able to optimize the reproduction of the audio at said position.
Signal localization arrangements using a plurality of receivers are often based on the determination of a delay difference between the signals at the outputs of the receivers. If the position of the receivers and a delay difference between the propagation paths between the source and the different receivers are known, the position of the source can be determined. If two receivers are used, it is possible to determine the direction with respect to the baseline between the receivers. If three receivers are used, it becomes possible to determine a position of the source in a 2-D plane. If more than 3 receivers, being not placed in a single plane, are used, it becomes possible to determine the position of a source in three dimensions.
In the prior art signal localization arrangements, the delay difference is determined by calculating a cross-correlation function between the signals received by the different receivers. The delay difference is then equal to the delay value in the cross-correlation function at which the highest correlation value occurs.
A problem with the prior art signal localization arrangement is that its operation depends heavily on the properties of the signal generated by the source. Especially voiced speech signals in a reverberant environment can disturb the operation. To reduce this large influence of the signal properties, a long averaging time has to be used in determining the cross-correlation function of the received signals.
The object of the present invention is to provide a signal localization arrangement in which the adverse influence of the signal properties has been reduced.
To achieve said purpose, the signal localization arrangement is characterized in that the signal source localization arrangement comprises impulse response determining means for determining a plurality of functions representing the impulse responses of the paths between the signal source and the receivers, and in that the delay estimation means are arranged for determining the delay difference from said functions.
A function representing the impulse response is a function that represents an important aspect of the impulse response, but it may differ substantially in other aspects from the real impulse response of the paths between signal sources and receivers.
By determining the delay difference from functions representing the impulse responses of the paths between the signal source and the receivers instead of from the received signals themselves, the influence of the properties of the signals on the determination of the delay difference is strongly reduced. Experiments have shown that the averaging time to be used in the determination of the delay difference can be strongly reduced.
Preferably, the delay difference is determined by calculating a cross correlation function of the functions representing the impulse responses.
An embodiment of the invention is characterized in that the impulse response determining means comprise adjustable filters for deriving filtered signals from the signals provided by the receivers, the signal source localization arrangement comprising combining means for deriving a combined signal from the filtered signals, in that the impulse response determining means comprises control means for controlling the adjustable filters in order to maximize a power measure of the combined signal, and in that the control means are arranged for limiting a combined power gain measure of the filtered audio signals to a predetermined value.
By combining a plurality of filtered signals and adjusting the filters for maximizing a power of the combined signal under the constraint of a limited combined power gain measure, it is obtained that the filters converge to a transfer function leading to filtered signals having a maximum degree of coherence before they are added. This means that the delay differences between the impulse responses of the adjustable filters correspond to the delay difference between the signals at the outputs of the receivers.
A further embodiment of the invention is characterized in that the control means comprise a plurality of further adjustable filters having a transfer function being the conjugate of the transfer function of the adjustable filters, said further adjustable filters being arranged for deriving from the combined audio signal filtered combined audio signals, and in that the control means are arranged for maximizing the power measure of the combined audio signal, and for restricting a combined power gain measure of the processed audio signals to a predetermined value by controlling the transfer functions of the adjustable filters and the further adjustable filters in order to minimize a difference measure between the input audio signals and the filtered combined audio signal corresponding to said input audio signals.
Experiments have shown that by using two sets of adjustable filters, the quality of the speech signal can be further enhanced. By minimizing a difference measure between the input audio signal and the corresponding filtered combined audio signal, it is obtained that a power measure of the combined audio signal is maximized under the constraint that per frequency component the sum of the power gains of the adjustable filters is equal to a predetermined constant. The correspondence between the two criteria mentioned above will be shown in the detailed description of the drawings by using a simplified example.