Higher Order Ambisonics (HOA) offers one possibility to represent three-dimensional sound among other techniques like wave field synthesis (WFS) or channel based approaches like 22.2. In contrast to channel based methods, however, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. This flexibility, however, is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
HOA is based on a representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. In the following, these time domain functions are referred to as HOA coefficient sequences or as HOA channels.
HOA has the potential to provide a high spatial resolution, which improves with a growing maximum order N of the expansion. It offers the possibility of analysing the sound field with respect to dominant sound sources.
Invention
An application could be how to identify from a given HOA representation independent dominant sound sources constituting the sound field, and how to track their temporal trajectories. Such operations are required e.g. for the compression of HOA representations by decomposition of the sound field into dominant directional signals and a remaining ambient component as described in patent application EP 12305537.8. A further application for such direction tracking method would be a coarse preliminary source separation. It could also be possible to use the estimated direction trajectories for the post-production of HOA sound field recordings in order to amplify or to attenuate the signals of particular sound sources.
In EP 12305537.8 it is proposed to successively perform the following three operations:                The number of currently present dominant sound sources within a time frame is identified and the corresponding directions are searched for. The number of dominant sound sources is determined from the eigenvalues of the HOA channel cross-correlation matrix. For the search of the dominant sound source directions the directional power distribution corresponding to a frame of HOA coefficients for a fixed high number of predefined test directions is evaluated. The first direction estimate is obtained by looking for the maximum in the directional power distribution. Then, the remaining identified directions are found by consecutively repeating the following two operations: the test directions in the spatial neighbourhood are eliminated from the remaining set of test directions and the resulting set is considered for the search of the maximum of the directional power distribution.        The estimated directions are assigned to the sound sources deemed to be active in the last time frame.        Following the assignment, an appropriate smoothing of the direction estimates is performed in order to obtain a temporally smooth direction trajectory.        
However, although with such processing the temporal smoothing of the direction estimates is accomplished in principle by computing the exponentially-weighted moving average, this technique has the disadvantage of not being able to accurately capture abrupt direction changes or onsets of new dominant sounds.
To overcome this problem, it was suggested in patent application EP 12306485.9 to introduce a simple statistical source movement prediction model, which is employed for a statistically motivated smoothing implemented by the Bayesian learning rule. However, EP 12306485.9 and EP 12305537.8 compute the likelihood function for the sound source directions only from the directional power distribution. This distribution represents the power of a high number of general plane waves from directions specified by nearly uniformly distributed sampling points on the unit sphere. It does not provide any information about the mutual correlation between general plane waves from different directions. In practice, the order N of the HOA representation is usually limited, resulting in a spatially band-limited sound field. In particular, this means that the contribution of a directional sound source to the directional power distribution is smeared around the true direction of incidence to directions in the neighbourhood. This smearing effect is mathematically described by a ‘dispersion function’, see below section Spatial resolution of Higher Order Ambisonics. Its extent grows with a decreasing order of the HOA representation. The EP 12306485.9 and EP 12305537.8 direction tracking methods, are considering this effect to a certain degree by constraining the search of directions to areas outside the neighbourhood of previously found directions. However, the specification of the neighbourhood assumes that all sound sources are encoded with the full order N of the HOA representation. This assumption is violated for HOA representations of order N which contain general plane waves encoded in a lower order than N. Such general plane waves of lower order than N may be the result of artistic creation in order to make sound sources appearing wider. However, they also occur with the recording of HOA sound field representations by spherical microphones.
The EP 12306485.9 and EP 12305537.8 direction tracking methods would identify more than a single sound source in case the sound field consists of a single general plane wave of lower order than N, which is an undesired property.
A problem to be solved by the invention is to improve the determination of dominant sound sources in an HOA sound field, such that their temporal trajectories can be tracked. This problem is solved by the methods disclosed in claims 1, 2 and 6. An apparatus that utilises the method of claim 6 is disclosed in claim 7.
The invention improves the EP 12306485.9 processing. The inventive processing looks for independent dominant sound sources and tracks their directions over time. The expression ‘independent dominant sound sources’ means that the signals of the respective sound sources are uncorrelated. While the state-of-the-art methods EP 12305537.8 and EP 12306485.9 are searching for all potential candidates for dominant sound source directions by looking at the directional power distribution of the original HOA representation only, the inventive processing described below removes for the search of each direction candidate from the original HOA representation all the components which are correlated with the signals of previously found sound sources. By such operation the problem of erroneously detecting many instead of only one correct sound source can be avoided in case its contributions to the sound field are highly directionally dispersed. As mentioned above, such an effect would occur for HOA representations of order N which contain general plane waves encoded in an order lower than N.
Like in EP 12306485.9, the candidates found for the dominant sound source directions are then assigned to previously found dominant sound sources and are finally smoothed according to a statistical source movement model. Hence, like in EP 12306485.9 the inventive processing provides temporally smooth direction estimates, and is able to capture abrupt direction changes or onsets of new dominant sounds.
The inventive processing determines estimates of dominant sound source directions for successive frames of an HOA representation in two subsequent processings:
From a current time frame k of an HOA representation, candidates or estimates for dominant sound source directions are successively searched, and the components of the HOA representation, which are supposed to be created by the respectine sound sources, are determined. In each iteration of this search process each further direction candidate is computed from a residual HOA representation which represents the original HOA representation from which all the components correlated with the signals of previously found sound sources have been removed. The current direction candidate is selected out of a number of predefined test directions, such that the power of the related general plane wave of the residual HOA representation, impinging from the chosen direction on the listener position, is maximum compared to that of all other test directions.
Next, the selected direction candidates for the current time frame are assigned to dominant sound sources found in the previous time frame k−1 of HOA coefficients. Thereafter the final direction estimates, which are smoothed with respect to the resulting time trajectory, are computed by carrying out a Bayesian inference process, wherein this Bayesian inference process exploits on one hand a statistical a priori sound source movement model and, on the other hand, the directional power distributions of the dominant sound source components of the original HOA representation. That a priori sound source movement model statistically predicts the current movement of individual sound sources from their direction in the previous time frame k−1 and movement between the previous time frame k−1 and the penultimate time frame k−2.
The assignment of direction estimates to dominant sound sources found in the previous time frame (k−1) of HOA coefficients is accomplished by a joint minimisation of the angles between pairs of a direction estimate and the direction of a previously found sound source, and maximisation of the absolute value of the correlation coefficient between the pairs of the directional signals related to a direction estimate and to a dominant sound source found in the previous time frame.
In principle, the inventive method is suited for determining directions of uncorrelated sound sources in a Higher Order Ambisonics representation denoted HOA of a sound field, said method including the steps:                in a current time frame of HOA coefficients, searching successively preliminary direction estimates of dominant sound sources, and computing HOA sound field components which are created by the corresponding dominant sound sources, and computing the corresponding directional signals;        assigning said computed dominant sound sources to corresponding sound sources active in the previous time frame of said HOA coefficients by comparing said preliminary direction estimates of said current time frame and smoothed directions of sound sources active in said previous time frame, and by correlating said directional signals of said current time frame and directional signals of sound sources active in said previous time frame, resulting in an assignment function;        computing smoothed dominant source directions using said assignment function, said set of smoothed directions in said previous time frame, a set of indices of active dominant sound sources in said previous time frame, a set of respective source movement angles between the penultimate time frame and said previous time frame, and said HOA sound field components created by the corresponding dominant sound sources;        determining indices and directions of the active dominant sound sources of said current time frame, using said smoothed dominant source directions, the frame delayed version of directions of the active dominant sound sources of said previous time frame and the frame delayed version of indices of the active dominant sound sources of said previous time frame,wherein said directional signals of sound sources active in said previous time frame are computed from said frame delayed version of directions of the active dominant sound sources of said previous time frame and the HOA coefficients of said previous time frame using mode matching,and wherein said set of source movement angles between said penultimate time frame and said previous time frame is computed from said frame delayed version of directions of the active dominant sound sources of said previous time frame and a further frame delayed version thereof.        
In principle the inventive apparatus is suited for determining directions of uncorrelated sound sources in a Higher Order Ambisonics representation denoted HOA of a sound field, said apparatus including:                means being adapted for searching successively in a current time frame of HOA coefficients preliminary direction estimates of dominant sound sources, and for computing HOA sound field components which are created by the corresponding dominant sound sources, and for computing the corresponding directional signals;        means being adapted for assigning said computed dominant sound sources to corresponding sound sources active in the previous time frame of said HOA coefficients by comparing said preliminary direction estimates of said current time frame and smoothed directions of sound sources active in said previous time frame, and by correlating said directional signals of said current time frame and directional signals of sound sources active in said previous time frame, resulting in an assignment function;        means being adapted for computing smoothed dominant source directions using said assignment function, said set of smoothed directions in said previous time frame, a set of indices of active dominant sound sources in said previous time frame, a set of respective source movement angles between the penultimate time frame and said previous time frame, and said HOA sound field components created by the corresponding dominant sound sources;        means being adapted for determining indices and directions of the active dominant sound sources of said current time frame, using said smoothed dominant source directions, the frame delayed version of directions of the active dominant sound sources of said previous time frame and the frame delayed version of indices of the active dominant sound sources of said previous time frame,wherein said directional signals of sound sources active in said previous time frame are computed from said frame delayed version of directions of the active dominant sound sources of said previous time frame and the HOA coefficients of said previous time frame using mode matching,and wherein said set of source movement angles between said penultimate time frame and said previous time frame is computed from said frame delayed version of directions of the active dominant sound sources of said previous time frame and a further frame delayed version thereof.        
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.