Audio rendering and sound virtualization has been a growing area in recent years. There are different playback techniques some of which are mono, stereo playback, surround 5.1, ambisonics etc. In addition to playback techniques, apparatus or signal processing integrated within apparatus or signal processing performed prior to the final playback apparatus has been designed to allow a virtual sound image to be created in many applications such as music playback, movie sound tracks, 3D audio, and gaming applications.
The standard for commercial audio content until recently, for music or movie, was stereo audio signal generation. Signals from different musical instruments, speech or voice, and other audio sources creating the sound scene were combined to form a stereo signal. Commercially available playback devices would typically have two loudspeakers placed at a suitable distance in front of the listener. The goal of stereo rendering was limited to creating phantom images at a position between the two speakers and is known as panned stereo. The same content could be played on portable playback devices as well, as it relied on a headphone or an earplug which uses 2 channels. Furthermore the use of stereo widening and 3D audio applications have recently become more popular especially for portable devices with audio playback capabilities. There are various techniques for these applications that provide user spatial feeling and 3D audio content. The techniques employ various signal processing algorithms and filters. It is known that the effectiveness of spatial audio is stronger over headphone playback.
Commercial audio today boasts of 5.1, 7.1 and 10.1 multichannel content where 5, 7 or 10 channels are used to generate surrounding audio scenery. An example of a 5.1 multichannel system is shown in FIG. 2 where the user 211 is surrounded by a front left channel speaker 251, a front right channel speaker 253, a centre channel speaker 255, a left surround channel speaker 257 and a right surround channel speaker 259. Phantom images can be created using this type of setup lying anywhere on the circle 271 as shown in FIG. 2. Furthermore a channel in multichannel audio is not necessarily unique. Audio signals for one channel after frequency dependent phase shifts and magnitude modifications can become the audio signal for a different channel. This in a way helps to create phantom audio sources around the listener leading to a surround sound experience. However such equipment is expensive and many end users do not have the multi-loudspeaker equipment for replaying the multichannel audio content. To enable multichannel audio signals to be played on previous generation stereo playback systems, the multichannel audio signals are matrix downmixed.
After the downmix the original multi-channel content is no longer available in its component form (each component being each channel in say 5.1).
Researchers have attempted to use various techniques to extract the multiple channels from stereo recordings. However, these are typically both computationally intensive and also highly dependent on a sparse distribution of the sources in a particularly time frequency domain. However this is problematic as sparsity of sources does not occur for certain sound scenes.
Some researchers have attempted to use a mathematical tool known as principal component analysis (PCA) which attempts to extract the principal component or coherent sound source from a stereo signal. The principal components are then passed through a decoder for the extraction of the various channels required.
However PCA approaches for primary and ambient decomposition of the stereo signal, rely on generation of two weights from the principal vector computed from the singular value decomposition of the covariance matrix, is computationally expensive. In such systems the singular value decomposition provides a low rank approximation to the matrix using its dominant Eigenvectors and Eigenvalues. The low rank approximation computed using the Eigenvectors minimises the Euclidean norm cost function between the matrix and its low rank version. Minimising the Euclidean norm as the cost function to obtain a low rank matrix to a 2×2 covariant matrix only takes into account the minimum mean square error between the individual elements.
This invention proceeds from the consideration that by using non-negative matrix factorisation (NMF) it is possible to obtain a rank 1 approximation to the covariance matrix. Furthermore it is also possible to obtain a low rank approximation to the covariance matrix for cost functions other than the Euclidean norm which further improves upon the accuracy of the audio channel identification and extraction process.