The present application claims priority under 35 U.S.C. xc2xa7 119 of Swiss Patent Application No. 2248/97 filed Sep. 24, 1997, the disclosure of which is expressly incorporated by reference herein in its entirety.
1. Field of the Invention
The present invention relates to a process and a device for mixing sound signals.
2. Discussion of the Background Information
Devices of the type described above are generally referred to as audio mixing consoles and provide parallel processing of a plurality of sound signals. In the wake of integrating new media (HDTV, home theater, DVD), stereo technology will be replaced by multi-channel, i.e., xe2x80x9csurroundxe2x80x9d playback processes. Surround-sound mixing consoles currently available on the market generally contain a bus matrix that is expanded to several output channels. For example, N input channels (e.g., N=8-265) are generated by mono-microphones and are processed in the individual channels, i.e., 1-N, weighted with factors, and wired to a bus bar. Control of these factors, for achieving acoustic positioning of the sound source within the room, is provided through panorama potentiometers (or xe2x80x9cpanpotsxe2x80x9d) such that an. In this context, xe2x80x9cphantom sound sourcesxe2x80x9d are created in which the listener experiences the illusion that the sound in the room is created outside the loudspeaker.
Psycho-acoustic research and experience of recent years has shown that the process mentioned above, known as xe2x80x9camplitude panningxe2x80x9d, only achieves an insufficient room mapping or playback of a sound field in a room in two dimensions. Thus, the phantom sound sources can only occur on connecting lines between loudspeakers, and they are not very stable. In particular, the location of the phantom sound sources change with the specific position of the listener. However, a much more natural playback is perceived by the listener if, e.g., the following two aspects are considered:
a) Loudspeaker signals are created such that the listener receives the same relative transit time differences and frequency-dependent damping processes in the left and right ear signal, i.e., as when listening to natural sound sources. Ear signals have to be correlated in a similar fashion. At low frequencies, the transit time differences are effective for localizing sound occurrences, while at higher frequencies (e.g.,  greater than 1000 Hz), amplitude (intensity) differences are for the most part effective. In conventional amplitude panning, all frequencies are substantially equally dampened and transit time differences are not considered. If one substitutes the weight factors with variable filters designed in the appropriate dimensions, both localization mechanisms can be satisfied. This process is generally referred to as a panoramic setting with the aid of filtering (i.e., xe2x80x9cpan-filteringxe2x80x9d ).
b) If a sound source is located in a room, the first reflections and those arriving up to a maximum of 80 msec after the direct sound aid in localizing the sound source. Distance perception particularly depends on the component of the reflections relative to the direct amount. Such reflections can be simulated in a audio mixing console or synthesized by delaying the signal several times and then assigning the signals created in this manner into different directions through the pan-filters described above.
Thus, the prior art sought to provide an audio mixing console that includes the above-mentioned features a) and b) while ensuring an affordable, i.e., a comparatively more economical, technical expenditure.
One of the first digital constructions was introduced by F. Richter and A. Persterer in xe2x80x9cDesign and Application of a Creative Audio Processorxe2x80x9d at the 86th AES Convention in Hamburg, Germany in 1989 and published in preprint 2782. In this device, direct pairs of xe2x80x9chead related transfer functionsxe2x80x9d (HRTF), i.e., filter functions measured with the right or left ear when a test signal is sent in a certain room direction, are used as pan-filters. An appropriate HRTF-pair is provided in accordance with an appropriate room direction to each output channel signal and to its echo that is created by delaying the signal. The stereo signals thus created are then connected to a two-channel bus bar. However, this device has the following disadvantages:
a) The playback of a single HRTF is very costly if satisfactory precision is to be achieved, i.e., non-recursive digital filters of 50xc2x0-150xc2x0 and recursive digital filters of 10xc2x0-30xc2x0 are required. Thus, this process occupies a significant portion of the available computer capacity of a modern digital signal processor (DSP). Further, because several echoes have to be simulated, e.g., between 5-30, for a natural playback, the entire system (with a large number of channels) becomes nearly unaffordable due to the large number of filters necessary.
b) The binaural audio mixing console only supplies a stereo signal at the output that is suitable for headphone playback While an adaptation to loudspeaker, multi-channel technology may be made by modifying the filters and increasing the number of bus bars, the expenditure would significant.
D. S. McGrath and A. Reilly introduced another device in xe2x80x9cA Suite of DSP Tools for Creation, Manipulation and Playback of Soundfields in the Huron Digital Audi Convolution Workstationxe2x80x9d at the 100th AES Convention held in 1996 in Copenhagen and published in the preprint 4233. In this device, the number of bus bars is reduced by using an intermediate format, independent of the number or arrangement of loudspeakers, to display the sound field. The translation to the respective output format is provided through a decoder at the bus bar output. A xe2x80x9cB-formatxe2x80x9d decoder is suggested for reproducing the sound field, in the two-dimensional case including three channels. The signal is weighted with the factors w, x=sin xcfx86 and y=cos xcfx86 and transferred onto the bus bar, in which w represents the signal level and xcfx86 the room direction. The B-format decoder controls the loudspeakers such that a sound field is optimally reconstructed at one point in the room in which the listener is located. However, this process has the disadvantage that the achievable localization focus is too low, i.e., neighboring and opposing loudspeakers radiate the same signal with only slight differences in the sound level. To achieve xe2x80x9cdiscrete effectsxe2x80x9d an accurate high channel separation is required. In a film mix, e.g., a sound should come exactly from a certain direction. This problem can be traced back to the selected sound field format (e.g., an insufficient number of channels) or to the design of the decoder that was optimized to reproducing of the sound field, and not optimized to channel separation. A further drawback is that only a passive matrix circuit is designed in the decoder. Thus, implementation of direction-dependent xe2x80x9cpan-filtersxe2x80x9d required at the outset would demand a significantly higher number of discretely transferred directions, as is mentioned in the following in more detail.
The present invention provides a process and device for producing the most natural sound playback over a number of loudspeakers when a different number of sound sources are present while also using a minimal amount of technical expenditure.
The present invention provides mixing 1-N sound signals to 1-M output signals by separating the sound signal from each input channel and selectively delaying the separated sound signal, selectively weighting each separated and selectively delayed sound or input signal, adding these signals to appropriate additional input signals from other input channels to one intermediate signal 1-K, and separating each separate intermediate signal into output channels 1-M, defiltering the separated intermediate signal and summing them together with the other intermediate signals. The summed-up intermediate signals together produce an output signal for a loudspeaker.
The device of the present invention for mixing sound signals from input channels E1-EN to output channels A1-AM shows each intermediate channel Z1-ZK coupled with an accumulator S and a multiplier M, each with 1-n partial channels of each input channel, and coupled with a decoder D that produces output channels A1-AM. In decoder D, each intermediate channel is separated into a number of filter channels with filters equivalent to the number of output channels and each filter channel is coupled to a filter channel of each of the other intermediate channels through an accumulator.
The achieved advantages of the present invention are especially apparent in view of the fact that the task-description defined at the outset is solved in all aspects. That is, the expenditure in particular is minimal, since the computing-intensive filters are needed only once in the system, i.e., at the output. The proposed sound field format is extremely useful for archiving music-material, since all available multi-channel formats can be created by choosing the appropriate decoders. Moving sources can also be simulated in a simple way, since no switching of filters is needed.
The present invention is directed to a process for mixing a plurality of sound signals. The process includes separating each sound signal and selectively delaying each separated sound signal. The process also includes selectively weighting each separated and selectively delayed sound signal and adding corresponding ones of the selectively weighted signals to an intermediary signal. The process also includes separating and filtering each intermediary signal, and adding the intermediary signals to form an output signal.
In accordance with another feature of the present invention, the process further includes modeling inter-aural transit time differences during the filtering. Further, the process includes modeling the intensity differences and transmit time differences independent of each other.
In accordance with another feature of the present invention, the process further includes modeling inter-aural intensity differences during the filtering. Further, the process includes modeling the intensity differences and transmit time differences independent of each other.
The present invention is directed to a device for mixing sound signals of a plurality of input channels into a plurality of output channels. The device includes each input channel having a plurality of partial channels, a decoder providing the plurality of outputs, and a plurality of intermediary channels coupled to the plurality of partial channels and to the decoder.
In accordance with another feature of the present invention, each intermediary channel includes a plurality of filter channels with filters. The plurality of filter channels corresponds with the number of output channels. The device also includes an accumulator and at least one filter channel of each of the intermediary channels being coupled through the accumulator.
In accordance with a further feature of the present invention, the device includes a multiplier such that the intermediary channels being coupled to partial channels through the accumulator and the multiplier.
In accordance with a still further feature of the present invention, the filters may include IIR-filters and FIR-filters that are switched in series.
The present invention is directed to a process for mixing a plurality of sound signals. The process includes separating each sound signal, selectively delaying each separated sound signal, selectively weighting each separated and selectively delayed sound signals in accordance with a number of channels, adding the selectively weighted signals corresponding to a same channel to form a plurality of intermediary signals, and decoding each intermediary signal to produce a plurality of output signals.
In accordance with another feature of the present invention, the decoding includes separating each intermediary signal into a plurality of signals to be filtered, the plurality of signals corresponding in number to a number of the plurality of output signals, filtering each separated intermediary signal, and adding corresponding filtered signals together to form the plurality of output signals.
In accordance with still another feature of the present invention, the filtering includes utilizing head related transfer functions normalized for each output direction.
In accordance with a further feature of the present invention, the filtering includes selecting a reference direction for normalization, determining a filter pair for each angle of incidence, approximating each filter pair by transfer functions of recursive filters of between approximately 1 and 6 degrees, processing the signal in a non-recursive filter, and processing the signal in a recursive filter.
In accordance with a still further feature of the present invention, the selective weighting includes multiplying the separated and selectively delayed sound signals for a particular channel by a weighting factor.
In accordance with another feature of the present invention, the separation of the sound signals includes separating each sound signal into a number of signals corresponding to a number of the plurality of sound signals to be mixed.
The present invention is directed to a device for mixing sound signals. The device includes a plurality of input channels, each input channel including a plurality of partial channels, a plurality of output channels, a decoder having a plurality of outputs corresponding to the plurality of outputs, and a plurality of intermediary channels coupled to the plurality of partial channels and to the decoder.
In accordance with another feature of the present invention, the plurality of partial channels corresponds in number to the plurality of input channels.
In accordance with another feature of the present invention, the device includes a plurality of multipliers corresponding in number to the plurality of intermediary channels, and each multiplier weighting the signal associated with each partial channel. Further, the device includes a plurality of accumulators coupled to add the weighted signals to each intermediary channel.
In accordance with yet another feature of the present invention, the decoder includes a plurality of filter channels for each intermediary channel corresponding decoder outputs, and an accumulator coupled to a filter channel associated each intermediary channel and to output a decoded signal. Further, each filter channel includes a finite duration impulse response filter and an infinite duration impulse response filter.
Other exemplary embodiments and advantages of the present invention may be ascertained by reviewing the present disclosure and the accompanying drawing.