Determination of presence and position related information is of interest in many audio applications including for example for hands-free communication and smart entertainment systems. The knowledge of user locations and their movement may be employed to localize audio-visual effects at user locations for a more personalized experience in entertainment systems. Also, such knowledge may be employed to improve the performance of hands-free (voice) communications, e.g. by attenuating sound from other directions than the estimated direction of the desired user.
In particular, such applications may use directional audio rendering or capture to provide improved effects. Such directionality can for example be derived from audio arrays comprising a plurality of audio drivers or sensors. Thus, acoustic beamforming is relatively common in many applications, such as in e.g. teleconferencing systems. In such systems, weights are applied to the signals of individual audio elements thereby resulting in the generation of a beam pattern for the array. The array may be adapted to the user positions in accordance with various algorithms. For example, the weights may be continually updated to result in the maximum signal level or signal to noise ratio in accordance with various algorithms. However, such conventional approaches require the audio source to be present, and consequently the weights of an acoustic array can be adapted only after a source becomes active.
This is disadvantageous in many scenarios. For example, user tracking tends to become inaccurate when there are only short bursts of acoustic activity. Such a scenario is typical for many applications including for example speech applications where the speaker typically only talks in intervals. Furthermore, beamforming can only be employed effectively after a certain duration of acoustic activity as the weight adaption takes some time to become sufficiently accurate. Also, false detections can occur in the presence of other acoustic sources. For example, if a radio or computer is producing sounds in the room the system may adapt to this sound source rather than the intended sound source, or the adaptation may be compromised by the noise source.
In order to address such issues, it has been proposed to use video cameras to perform position determination and to use the video signal to control the adaptation of the weights. However, such approaches tend to be complex, expensive and resource demanding in terms of computational and power resource usage.
Hence, an improved audio system would be advantageous and in particular a system allowing increased flexibility, reduced resource usage, reduced complexity, improved adaptation, improved reliability, improved accuracy and/or improved performance would be advantageous.