Digital audio compression has been a very active field for research and commercial applications, and consequently improvements have recently evidenced diminishing returns. Such work, however, has primarily focused on compressing monophonic signals. Stereo signals, on the other hand, comprise two monophonic signals. The assumption has persisted that twice the bit rate of the single compressed monophonic channel was required for stereo. The connection had simply not been made that two signals of stereo informational content are not only strongly related, but that much of the difference between the two channels is of little consequence to the ear.
Referring to FIGS. 1 and 2, in FIG. 1 a conventional stereo field 1 is depicted, typically generated by a left and right channel, 10, 12 as perceived by the observer 14. As shown in FIG. 2, often these two stereo channels, 10, 12 are electronically split into a sum channel 16 and a difference channel 18 by either adding the two (shown functionally by adder 20) or subtracting the two signals (functionally shown by subtracter 22), the former being the monophonic component, and the latter being the pure stereo difference component which is 0 for a monophonic signal. Averaged across many types of music, the difference signal 18 was found empirically to typically be 3 dB lower than the sum signal 16 at most frequencies, and has further been found to contain very little deep bass because of the nature of acoustic stereo pickup 5.
Still referring to FIG. 2, at the receiving end a similar sum and difference function 24, 26, respectively, was provided to either sum or take the difference between the monophonic sum signal 16 and stereo difference signal 18, the outputs of which resulted in the desired left and right channels again, 28, 30, (corresponding to channels 10, 12 of FIG. 1 respectively). Typically vinyl records, FM broadcasts, and stereo TV all encoded a sum and difference signal in the manner just described. In part this was for purposes of compatibility, but it was also found that lower magnitude and reduction in bass of the difference signal better matches the "weaker" channel which is vertical motion or the 38 KHz signals in a record or FM broadcast, respectively.
In yet another attempt to efficiently encode stereo source information, a technique was developed and referred to in the art as Carver FM noise reduction as shown in FIG. 3. It was found in the course of research on frequency modulated signals that in FM reception the difference signal was characteristically far noisier than the sum signal. Accordingly, some manufacturers began selling FM tuners in which a difference signal was synthesized from the sum signal by a random phasing technique employed in stereo synthesizers. In such a signal the FM receiver 32 provided for a sum and difference channel 34, 36 in the conventional manner. However, additionally, a synthesizer circuit 38 was provided which synthesized the difference signal at appropriate times, e.g. during quiet passages wherein the noise of the "true" difference signal 36 was most noticeable. A switch 35 was provided for switching between the true difference signal 36 and the synthesized signal 42 out of the synthesizer 38, after which the sum signal 34 and switched difference signal 35 were added and subtracted in the conventional manner by the adder and subtracter functions 44, 46 respectively, yielding the desired left and right channels 48, 50. In this technique some separation information was lost in order to effect the desired benefit of reduced noise. However, it was found that due to psychoacoustic phenomenon associated with the listener, the artificial stereo ambiance was accepted without a perceived loss of quality.
There are several aural characteristics of airwaves which are not reproduced with stereo signals unless recorded and reproduced in binaural fashion. In like fashion there are several aural characteristics in a stereo signal not present in monophonic signals, a few of which have been found to be most important for reproducing the stereo experience as reproduced with two speakers.
The most important dimension added by stereo over monophonic sound is the distinction between a "center" signal 15 that is equally phased between the two speaker sources 10 and 12 of FIG. 1, and a "surround" signal 52 which is randomly phased between the two speaker sources. It is this interplay between the center and surround signals when switching from mono to stereo which provides the ambiance causing the perception of such stereo sound as being beautiful and dimensional.
Yet a second most important dimension added by stereo is the left-right separation which, although receiving much attention, has actually been found to be less important than the "surround" aspect. Unlike earlier stereo recordings, modern recordings utilize the left-right separation more in moderation, reserving the full impact only for special effects and concentrating instead on utilizing the center-surround aspect. Although there are other dimensions of a stereo signal, they are not readily discernible on a small stereo system such as a television with two speakers. There are also aspects of binaural sound, such as up-down or front-back which are typically not discernible with two speaker stereo systems.
The perception of surround sound, FIG. 1, has been utilized in movie theaters recently and in homes when viewing movies to recreate four channels of audio from two channels of stereo.
Referring to FIG. 4, a linear matrix as shown therein provides 3 dB of separation, e.g. a soloist mixed equally into the left and right channels, 54, 56 will appear in the front speaker 38 3 dB stronger than in the left or right speakers 54, 56. This corresponds to only 30% or 50% of full separation depending upon whether determined in terms of pressure or power, respectively. Such separation has been found to be inadequate because of the overriding Haas effect, and consequently true decoders in the art were developed to add steering logic to electronically increase volume of the four channels at predetermined times in order to obtain more separation. Such steering logic detected phase effects only in frequencies of a limited bandwidth as, for example, between about 500 to 5K Hz. This detected information in turn was utilized to change the volume of all frequencies equally, having a relatively slow response on the order of tens to hundreds of milliseconds, and typically was not even time-aligned with the signal.
Notwithstanding the relative simplicity of such a system, it was found to be remarkably effective in fooling the human ear into perceiving a surround sound field. It has been found that the ear bases directional sensing on transient peaks whereby, for example, if two people are talking, their voice peaks will occur at differing times and the human "logic" will steer the signal in the direction of the perceived peak. During moments when both voices are of equal amplitude however, the steering logic cannot operate, but the human ear nevertheless does not mind because it could not have distinguished direction very well under such conditions in any event. Accordingly, it "remembers" where each voice was and fills in direction for the hearer.
From the foregoing, due to the properties of the ear, it was found that effectively four channels of sound might be encoded into two channels. It was an object of the invention to seek a way to provide for two channels of sound within effectively one channel.
It was a further object of the invention to provide for encoding of a digital stereo signal to provide digital audio compression in stereo in half the normal bandwidth.
It is yet another object of the invention to create the effect of a stereo system in the bandwidth of a monophonic system plus a very small co-channel.
It was yet another object of the invention to do so such that with small systems in most cases the perceived signal would be indistinguishable from a true stereo signal. These and other objects are met by the present invention, a description of which may be understood with reference to the accompanying figures wherein: