The present invention relates to audio processing and, in particular to an apparatus and method for generating an output signal employing a decomposer.
The human auditory system senses sound from all directions. The perceived auditory (the adjective auditory denotes what is perceived, while the word sound will be used to describe physical phenomena) environment creates an impression of the acoustic properties of the surrounding space and the occurring sound events. The auditory impression perceived in a specific sound field can (at least partially) be modeled considering three different types of signals: The direct sound, early reflections, and diffuse reflections. These signals contribute to the formation of a perceived auditory spatial image.
Direct sound denotes the waves of each sound event that first reach the listener directly from a sound source without disturbances. It is characteristic for the sound source and provides the least-compromised information about the direction of incidence of the sound event. The primary cues for estimating the direction of a sound source in the horizontal plane are differences between the left and right ear input signals, namely interaural time differences (ITDs) and interaural level differences (ILDs). Subsequently, a multitude of reflections of the direct sound arrive at the ears from different directions and with different relative time delays and levels. With increasing time delay, relative to the direct sound, the density of the reflections increases until they constitute a statistical clutter.
The reflected sound contributes to distance perception, and to the auditory spatial impression, which is composed of at least two components: apparent source width (ASW) and listener envelopment (LEV). ASW is defined as a broadening of the apparent width of a sound source and is primarily determined by early lateral reflections. LEV refers to the listener's sense of being enveloped by sound and is determined primarily by late-arriving reflections. The goal of electroacoustic stereophonic sound reproduction is to evoke the perception of a pleasing auditory spatial image. This can have a natural or architectural reference (e.g. the recording of a concert in a hall), or it may be a sound field that is not existent in reality (e.g. electroacoustic music).
From the field of concert hall acoustics, it is well known that—to obtain a subjectively pleasing sound field—a strong sense of auditory spatial impression is important, with LEV being an integral part. The ability of loudspeaker setups to reproduce an enveloping sound field by means of reproducing a diffuse sound field is of interest. In a synthetic sound field it is not possible to reproduce all naturally occurring reflections using dedicated transducers. That is especially true for diffuse later reflections. The timing and level properties of diffuse reflections can be simulated by using “reverberated” signals as loudspeakers feeds. If those are sufficiently uncorrelated, the number and location of the loudspeakers used for playback determines if the sound field is perceived as being diffuse. The goal is to evoke the perception of a continuous, diffuse sound field using only a discrete number of transducers. That is, creating sound fields where no direction of sound arrival can be estimated and especially no single transducer can be localized.
Stereophonic sound reproductions aim at evoking the perception of a continuous sound field using only a discrete number of transducers. The features desired the most are directional stability of localized sources and realistic rendering of the surrounding auditory environment. The majority of formats used today to store or transport stereophonic recordings are channel-based. Each channel conveys a signal that is intended to be played back over an associated loudspeaker at a specific position. A specific auditory image is designed during the recording or mixing process. This image is accurately recreated if the loudspeaker setup used for reproduction resembles the target setup that the recording was designed for.
Surround systems comprise a plurality of loudspeakers. Ordinary surround systems may, for example, comprise five loudspeakers. If the number of transmitted channels is smaller than the number of loudspeakers, the question arises, which signals are to be provided to which loudspeakers. For example, a surround system may comprise five loudspeakers, while a stereo signal is transmitted having two transmitted channels. On the other hand, even if a surround signal is available, the available surround signal may have fewer channels than the number of speakers of a user's surround system. For example, a surround signal having 5 surround channels may be available, while the surround system that intends to play back the surround signal may have e.g. 9 loudspeakers.
In particular in car surround systems, the surround system may comprise a plurality of loudspeakers, e.g. 9 loudspeakers. Some of these speakers may be arranged at a horizontal position with respect to a listener's seat while other speakers may be arranged at an elevated position with respect to the seat of the listener. Upmix algorithms may have to be employed to generate additional channels from the available channels of the input signal. With respect to a surround system having a plurality of horizontal and a plurality of elevated speakers, the particular problem arises which sound portions are to be played back by the elevated speakers and which sound portions are to be played back by the horizontal speakers.