The present invention is in the field of audio processing, especially spatial audio processing and conversion of different spatial audio formats.
DirAC audio coding (DirAC=Directional Audio Coding) is a method for reproduction and processing of spatial audio. Conventional systems apply DirAC in two dimensional and three dimensional high quality reproduction of recorded sound, teleconferencing applications, directional microphones, and stereo-to-surround upmixing, cf.    V. Pulkki and C. Faller, Directional audio coding: Filterbank and STFT-based design, in 120th AES Convention, May 20-23, 2006, Paris, France May 2006,    V. Pulkki and C. Faller, Directional audio coding in spatial sound reproduction and stereo upmixing, in AES 28th International Conference, Pitea, Sweden, June 2006,    V. Pulkki, Spatial sound reproduction with directional audio coding, Journal of the Audio Engineering Society, 55(6):503-516, June 2007,    Jukka Ahonen, V. Pulkki and Tapio Lokki, Teleconference application and B-format microphone array for directional audio coding, in 30th AES International Conference.
Other conventional applications using DirAC are, for example, the universal coding format and noise canceling. In DirAC, some directional properties of sound are analyzed in frequency bands depending on time. The analysis data is transmitted together with audio data and synthesized for different purposes. The analysis is commonly done using B-format signals, although theoretically DirAC is not limited to this format. B-format, cf. Michael Gerzon, Surround sound psychoacoustics, in Wireless World, volume 80, pages 483-486, December 1974, was developed within the work on Ambisonics, a system developed by British researchers in the 70's to bring the surround sound of concert halls into living rooms. B-format consists of four signals, namely w(t),x(t),y(t), and z(t). The first corresponds to the pressure measured by an omnidirectional microphone, whereas the latter three are pressure readings of microphones having figure-of-eight pickup patterns directed towards the three axes of a Cartesian coordinate system. The signals x(t),y(t) and z(t) are proportional to the components of particle velocity vector directed towards x,y and z respectively.
The DirAC stream consists of 1-4 channels of audio with directional metadata. In teleconferencing and in some other cases, the stream consists of only a single audio channel with metadata, called a mono DirAC stream. This is a very compact way of describing spatial audio, as only a single audio channel needs to be transmitted together with side information, which e.g., gives good spatial separation between talkers. However, in such cases some sound types, such as reverberated or ambient sound scenarios may be reproduced with limited quality. To yield better quality in these cases, additional audio channels need to be transmitted.
The conversion from B-format to DirAC is described in V. Pulkki, A method for reproducing natural or modified spatial impression in multichannel listening, Patent WO 2004/077884 A1, September 2004. Directional Audio Coding is an efficient approach to the analysis and reproduction of spatial sound. DirAC uses a parametric representation of sound fields based on the features which are relevant for the perception of spatial sound, namely the DOA (DOA=direction of arrival) and diffuseness of the sound field in frequency subbands. In fact, DirAC assumes that interaural time differences (ITD) and interaural level differences (ILD) are perceived correctly when the DOA of a sound field is correctly reproduced, while interaural coherence (IC) is perceived correctly, if the diffuseness is reproduced accurately. These parameters, namely DOA and diffuseness, represent side information which accompanies a mono signal in what is referred to as mono DirAC stream.
FIG. 7 shows the DirAC encoder, which from proper microphone signals computes a mono audio channel and side information, namely diffuseness Ψ(k,n) and direction of arrival eDOA(k,n). FIG. 7 shows a DirAC encoder 200, which is adapted for computing a mono audio channel and side information from proper microphone signals. In other words, FIG. 7 illustrates a DirAC encoder 200 for determining diffuseness and direction of arrival from proper microphone signals. FIG. 7 shows a DirAC encoder 200 comprising a P/U estimation unit 210, where P(k,n) represents a pressure signal and U(k,n) represents a particle velocity vector. The P/U estimation unit receives the microphone signals as input information, on which the P/U estimation is based. An energetic analysis stage 220 enables estimation of the direction of arrival and the diffuseness parameter of the mono DirAC stream.
The DirAC parameters, as e.g. a mono audio representation W(k,n), a diffuseness parameter Ψ(k,n) and a direction of arrival (DOA) eDOA(k,n), can be obtained from a frequency-time representation of the microphone signals. Therefore, the parameters are dependent on time and on frequency. At the reproduction side, this information allows for an accurate spatial rendering. To recreate the spatial sound at a desired listening position a multi-loudspeaker setup is required. However, its geometry can be arbitrary. In fact, the loudspeakers channels can be determined as a function of the DirAC parameters.
There are substantial differences between DirAC and parametric multichannel audio coding, such as MPEG Surround, cf. Lars Villemocs, Juergen Herre, Jeroen Breebaart, Gerard Hotho, Sascha Disch, Heiko Purnhagen, and Kristofer Kjrling, MPEG surround: The forthcoming ISO standard for spatial audio coding, in AES 28th International Conference, Pitea, Sweden, June 2006, although they share similar processing structures. While MPEG Surround is based an a time/frequency analysis of the different, loudspeaker channels, DirAC takes as input the channels of coincident microphones, which effectively describe the sound field in one point. Thus, DirAC also represents an efficient recording technique for spatial audio.
Another system which deals with spatial audio is SAOC (SAOC=Spatial Audio Object Coding), cf. Jonas Engdegard, Barbara Resch, Cornelia Falch, Oliver Hellmuth, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev, Jeroen Breebaart, Jeroen Koppens, Erik Schuijers, and Werner Oomen, Spatial audio object (SAOC) the upcoming MPEG standard on parametric object based audio coding, in 12th AES Convention, May 17-20, 2008, Amsterdam, The Netherlands, 2008, currently under standardization ISO/MPEG. It builds upon the rendering engine of MPEG Surround and treats different sound sources as objects. This audio coding offers very high efficiency in terms of bitrate and gives unprecedented freedom of interaction at the reproduction side. This approach promises new compelling features and functionality in legacy systems, as well as several other novel applications.