1. Technical Field
The present invention relates generally to signal processing for audio applications and more specifically to a novel and improved audio upmixer and method for upmixing stereophonic audio channels.
2. Description of the Related Art
Current audio applications have developed from the standard 2-channel stereophonic audio playback systems to more complex systems wherein different effects are achieved, and different sensations provided, via the use of a number of loudspeakers. Not only has the number of loudspeakers increased, but also the number of features of each loudspeaker, with varying characteristics, yielding throughout the years increasingly varied professional and domestic loudspeaker systems.
These multichannel implementations have also evolved to include “surround-sound” effects. Such surround-sound loudspeaker audio systems are today found in theatres, music auditoria, automobiles, and domestic theatre and computer systems, amongst others. However these implementations typically comprise a wide variety of individual full-range loudspeakers and sub-woofers, each with their own sound characteristics and input/output responses.
Additionally, there are also a wide variety of types of audio signals which are being reproduced, as music, film soundtrack or voice sources are all being processed. However, to provide the optimum mixing of input signals for a given loudspeaker configuration requires laborious and skilled manual signal processing operations, comprising filtering and mixing by skilled technicians.
Audio upmix, or upmixer, systems have been proposed in order to effectively upmix N original audio signals into M upmixed audio signals, where M>N. For instance, systems exist which generate at least two surround audio channels. Other prior art systems produce two surround channels which detect hard-panned sources and ensure that voice signals will always be located in the front channels even if they exist in only one input channel.
More commonly however, upmixing systems for home or professional theatre systems are usually configured to generate 3 front loudspeaker signals, 2 surround signals, and a low frequency effects, LFE, or subwoofer, signal to drive a sub-woofer loudspeaker, as represented in FIG. 1A. The 3 front loudspeaker signals are normally used for outputting all sound types, including voice, the 2 surround signals for producing ambient sounds and the LFE subwoofer signal is used to generate low frequency special effects. This combination results in an enhanced experience for the end user due to the different sound components being generated in the different loudspeakers. In particular, the sound imagery is enhanced because sound images are located around the listening area, giving a more natural enveloping imagery compared with reproduction on two frontal loudspeakers.
These systems normally comprise audio matrix coding and decoding operations. Matrix decoding is a type of adaptive or non-adaptive audio upmixing whereby a higher number of output audio signals (e.g. 6 for a 5.1 system) is decoded from a smaller number (typically 2) of input signals. However systems comprising non-matrix coding and decoding also exist.
A disadvantage of these prior art systems is apparent when input signals containing audio generated using phase affects, such as a low frequency component that is 180 degrees out of phase in one input channel relative to the other, are used as inputs to the upmixers. Such phase inversion mixing is a very common audio technique used in music and film audio production to give a wide spatial imagery. These phase inverted input signals are normally summed, and since the out of phase signals cancel each other out, no signal is present in the LFE signal. Therefore the desired sub-woofer effect is not achieved.
A further disadvantage of existing systems is that sound components originally only present in one input channel are generated as output also in the centre channel, therefore producing a non-realistic outputsound image. For instance, consider a musical audio signal corresponding to a recorded musical instrument present on only the left input channel. If the upmixed centre channel is generated by summing the input left and right channels, then this upmixed centre channel will also contain the recorded musical instrument signal. This is an undesirable effect as it should only be perceived on the left when auditioned: that is, the spatial sound image quality of the auditioned upmixed signal will be poor.
Other implementations deal with generating a centre channel upmix signal, however they are intentionally configured so that out-of-phase signals do not cancel each other out and will be eventually present in the upmixed centre channel. However such designs are sub-optimal in that the out-of-phase sound is normally intended as sound for special effects, to be output from the surround loudspeakers, or the LFE loudspeaker, but not from the centre channel. Since the intention of the special effect sound is not intended to be emitted from the centre channel, a degraded reproduction of the original sound results.
Another effect which audio signal processing equipments need to take into account is time-smearing. It is very common for music recordings, or speech recordings, from live conferences, or with live dialogue, in films and television, to use more than one microphone for the recording. Each microphone is normally physically positioned at different corners of the room. In this scenario, the sound being recorded happens to be physically closer to one microphone more than the others resulting in signals containing audio generated time-delay effects, due to the fact that the sound arrives in one microphone before the other. This effect is termed time-delay panning or time-smearing. When such signals are summed, or summed after a gain is applied to one or both signals, then the resulting summed signal will contain a time-smeared signal, or a signal with a temporally smeared image, which results in reduced sound quality due, in part, to out-of-phase sound artefacts. This effect can be readily understood if the signal to be recorded is simply a “click” sound. Since the click arrives in one channel before the other, then if a non-zero gain is applied to one or both channels and the result is summed, then two clicks will appear in the resulting summed channel. Again this results in a poor reproduction of the original sound image.
Hence prior art audio upmixing systems wherein the two-channel audio material comprises time-delay panned recordings suffer at least in part from a combination of these disadvantages, wherein the original sound is not reproduced with fidelity, wherein the reproduction of special effects is not optimally achieved, or the special effect is reproduced in the wrong loudspeaker. This combination results in an overall unnatural listening experience for the listener.