The invention relates to a device and a method for decorrelating loudspeaker signals by altering the acoustic scene reproduced.
For a three-dimensional hearing experience, it may be intended to give the respective listener of an audio piece or viewer of a movie a more realistic hearing experience by means of three-dimensional acoustic reproduction, for example by acoustically giving the listener or viewer the impression of being located within the acoustic scene reproduced. Psycho-acoustic effects may also be made use of for this. Wave field synthesis or higher-order ambisonics algorithms may be used in order to generate a certain sound field within a playback or reproduction space using a number or multitude of loudspeakers. The loudspeakers here may be driven such that the loudspeakers generate wave fields which completely or partly correspond to acoustic sources arranged at nearly any location of an acoustic scene reproduced.
Wave field synthesis (WFS) or higher-order ambisonics (HOA) allow a high-quality spatial hearing impression for the listener by using a large number of propagation channels in order to spatially represent virtual acoustic source objects. In order to achieve a more immersive user experience, these reproduction systems may be complemented by spatial recording systems so as to allow further applications, such as, for example, interactive applications, or improve the reproduction quality. The combination of the loudspeaker array, the enclosing space or volume, such as, for example, a playback space, and the microphone array is referred to as loudspeaker enclosure microphone system (LEMS) and is identified in many applications by simultaneously observing loudspeaker signals and microphone signals. However, it is known already from stereophonic acoustic echo cancellation (AEC) that the typically strong cross-correlations of the loudspeaker signals may inhibit sufficient system identification, as is described, for example, in [BMS98]. This is referred to as the non-uniqueness problem. In this case, the result of the system identification is only one of an indefinite number of solutions determined by the correlation characteristics of the loudspeaker signals. The result of this incomplete system identification nevertheless describes the behavior of the true LEMS for the current loudspeaker signals and may thus be used for different adaptive filtering applications, for example AEC or listening room equalization (LRE). However, this result will no longer be true when the cross-correlation characteristics of the loudspeaker signals change, thereby causing the behavior of the system, which is based on these adapted filters, to become unstable. This lacking robustness constitutes a major obstacle to the applicability of many technologies, such as, for example, AEC or adaptive LRE.
An identification of a loudspeaker enclosure microphone system (LEMS) may be necessitated for many applications in the field of acoustic reproduction. With a large number of propagation paths between loudspeakers and microphones, as may, for example, apply for wave field synthesis (WFS), this problem may be particularly challenging due to the non-uniqueness problem, i.e. due to an under-determined system. When, in an acoustic playback or reproduction scene, fewer virtual sources are represented than the reproduction system comprises loudspeakers, this non-uniqueness problem may arise. In such a case, the system may no longer be identified uniquely and methods including system identification suffer from small or low robustness or stability to varying correlation characteristics of the loudspeaker signals. A current measure against the non-uniqueness problem entails modifying the loudspeaker signals (i.e. decorrelation) so that the system or LEMS may be identified uniquely and/or the robustness is increased under certain conditions. However, most approaches known may reduce audio quality and may even interfere in the wave field synthesized, when being applied in wave field synthesis.
For the purpose of decorrelating loudspeaker signals, three possibilities are known to increase the robustness of system identification, i.e. identification or estimation of the real LEMS:
[SMH95], [GT98] and [GE98] suggest adding noise, which is independent of different loudspeaker signals, to the loudspeaker signals. [MHBOI], [BMS98] suggest different non-linear pre-processing for every reproduction channel. In [Ali98], [HBK07], different time-varying filtering is suggested for each loudspeaker channel. Although the techniques mentioned in the ideal case are not to impede the sound quality perceived, they are generally not well suitable for WFS: Since the loudspeaker signals for WFS are determined analytically, time-varying filtering may significantly interfere in the wave field reproduced. When high quality of the audio reproduction is strived for, a listener may not accept noise signals added or non-linear pre-processing, which both may reduce audio quality. In [SHK13], an approach suitable for WFS is suggested, in which the loudspeaker signals are pre-filtered such that an alteration of the loudspeaker signals as a time-varying rotation of the wave field reproduced is obtained.