The present invention relates generally to audio noise reduction and signal processing systems. In particular, the invention relates to the problem of encoding audio signals in such a way that substantially full complementarity is obtained when using decoders designed for one type complementary encode/decode system while permitting compatible decoding with decoders designed for use with another type complementary encode/decode system, or without any special decoders at all, while minimizing audibly objectionable side-effects.
In its preferred embodiments, the invention is directed to the processing of audio signals recorded in motion picture sound tracks. However, the principles of the invention are not limited to the motion picture sound environment and also may be applied to other audio recording and reproduction environments.
Within the past ten years or so an increasing number of motion pictures have been made with sound tracks encoded with A-type noise reduction, a complementary system developed by Dolby Laboratories which requires a decoder to obtain its full benefits. The use of A-type noise reduction for motion picture sound tracks is described in "The Production of Wide-Range, Low-Distortion Optical Soundtracks Utilizing the Dolby Noise Reduction System," by Ioan Allen, J. SMPTE, September 1975, Vol. 84, No. 9, pp. 720-729. Over ten thousand motion picture theaters around the world are equipped with A-type noise reduction decoders. Currently, about forty percent of all motion pictures produced in the United States have A-type encoded sound tracks.
The basic elements of A-type noise reduction are described in "An Audio Noise Reduction System," by Ray M. Dolby, J. Audio Eng. Soc., October 1967, Vol. 15, No. 4, pp. 383-388. Various A-type noise reduction products (encoders, decoders, encoder/decoders) are manufactured and sold by Dolby Laboratories. A-type noise reduction employs four frequency bands: band 1, 80 Hz lowpass; band 2, 80 Hz to 3 kHz bandpass; band 3, 3 kHz high-pass; and band 4, 9 kHz high-pass.
Recently, the originators of A-type noise reduction introduced and began marketing an improved audio signal processing system, spectral recording. This new system is described in "The Spectral Recording Process," by Ray Dolby, J. Audio Eng. Soc., Vol. 35, No. 3, March 1987, pp. 99-118. Various spectral recording products (encoders, decoders, and encoder/decoders) are manufactured and sold by Dolby Laboratories. Spectral recording employs two frequency ranges with a broadly defined crossover frequency of 800 Hz, such that there is a substantial overlap.
Spectral recording bears some similarities to A-type noise reduction. For example, both are complementary systems in which a main signal path is primarily responsible for conveying high level signals and a side chain or side path signal with the system characteristic (A-type or spectral recording, respectively) is additively combined with the main signal in the encoding mode and subtractively in the decoding mode, whereby an overall complementary action is obtained.
In spectral recording, a multi-stage series arrangement is used with staggered regions of dynamic action. The high-level and mid-level stages have both high frequency and low frequency sub-stages with a crossover frequency of 800 Hz. The low-level stage has only a high frequency sub-stage, with an 800 Hz high pass characteristic. In the spectral recording encoded, each stage has a low-level gain of about 8 dB, such that when the outputs of the stages are combined with the main signal path a total dynamic effect of about 16 dB is obtained at low frequencies and 24 dB at high frequencies. The reciprocal characteristic is provided in the spectral recording decoder.
In the A-type system, a single stage is used in which the outputs of the four bands are combined with the main signal path in such a way as to produce a low-level output from the encoder which is uniformly 10 dB higher than the input signal up to about 5 kHz, above which the level increases smoothly to 15 dB higher at 15 kHz. The reciprocal characteristic is provided in the A-type decoder.
A further difference between A-type noise reduction and spectral recording is the manner in which dynamic action is provided. In the A-type system, the dynamic action in each of the four frequency bands is provided by a fixed band circuit in which the signal gain varies essentially uniformly across each particular band in response to signals within the frequency band. In other words, in an A-type expander, the dynamic action within each band is a variable, but flat, low level cut across the entire band.
In the spectral recording system, the dynamic action is provided by an action substitution technique that combines, in a synergistic manner, the characteristic actions of fixed band and sliding band (variable filter) circuits operating in each of the sub-stages. The action substitution technique and the use of single-pole filters to allow a broad overlapping of action above and below the 800 Hz crossover frequency, provides an overall dynamic action that is highly conformable to signals virtually anywhere in the frequency band. In other words, the spectral recording encoding action is highly frequency selective and adaptive by virtue of its action substitution of fixed band and sliding band elements operating in broadly overlapping frequency bands; the overall effect is essentially that of variable width and variably positioned frequency bands, an almost infinitely variable characteristic that adapts itself to both the level and frequency content of the signal. In contrast, the A-type system, which employs non-varying frequency bands, each having fixed band dynamic action, has a characteristic that adapts itself only in a limited way to signal level and frequency content.
Another difference in the characteristics of spectral recording and the A-type system is that spectral recording employs level dependent low- and high-frequency anti-saturation, providing in the encoded signal a gentle roll-off in the low and high frequency regions that increases as the signal level rises in order to reduce the possibility of overloading the medium on which a spectral recording encoded signal is recorded or transmitted at frequency extremes where the ear is less sensitive to noise. Also, spectral recording employs low- and high-frequency spectral skewing, an abrupt and deep reduction in the low-and high-frequency extremes of the encoded signal, primarily for the purpose of reducing the susceptibility of the spectral recording decoder to any uncertainties in the low- and high-frequency extreme regions of the recording or transmission medium. Both anti-saturation and spectral skewing are complementary in the spectral recording system; complementary de-anti-saturation and spectral de-skewing are provided in the decoder.
In order to benefit from the improved performance and characteristics of spectral recording, such as its improved dynamic range, lower noise modulation, improved transient response, and greatly reduced low- and high-frequency saturation, it would be desirable to employ that new system rather than A-type noise reduction in the production and playback of motion picture sound tracks. Of particular benefit for use on motion picture optical (photographic) sound tracks is the substantially improved low- and high-frequency overload margin provided by spectral recording.
Adoption of spectral recording for motion picture sound tracks would present no problems if spectral recording encoded sound track motion picture films were supplied only to motion picture theaters having spectral recording decoders. However, two factors severely restrict that approach: (1) motion picture producers prefer, whenever possible, to release a film in "single inventory" (for example, all prints of a specific film are A-type encoded, even those prints supplied to theaters not having A-type decoding equipment), and (2) in view of the very large number of theaters having A-type decoders, single inventory films must be compatible with those A-type decoders.
In view of the various differences between the A-type and spectral recording systems, it would appear, upon first analysis, that the systems are not compatible, in the sense that reproduction of spectral recording encoded audio signals by A-type decoders would likely result in subjectively annoying audible effects. It would also appear that the reproduction of spectral recording encoded audio signals without any special decoding (non-decoded playback) would likely also result in subjectively annoying audible effects. However, in accordance with the teachings of the invention such apparent incompatibility can be overcome.
Listening tests indicate that audible effects relating to system compatibility problems may be characterized as: (1) apparently steady-state effects, namely, changes in frequency content of the reproduced signal as, for example, low frequency or high frequency emphasis, and (2) dynamic effects, usually referred to as "pumping," whereby signals and/or noise in one part of the frequency spectrum vary in level in accordance with the level of a signal in another part of the spectrum. The extent to which the ear tolerates such effects is, of course, level dependent: if the effect is at a sufficiently high level it is not acceptable.
Preferably, dynamic effects should be eliminated or minimized because such instability in the reproduced signal is more readily perceived by the listener than are steady-state effects. Steady-state effects are less likely to be noticed by most listeners because there is no changing sound to attract the ear's attention. Even to critical listeners steady-state effects may seem attributable to differences in the sound mix. Of course, a direct A/B comparison between fully complementary encoding/decoding and a partially complementary "compatible" arrangement would reveal some differences in the reproduced signal. However, in a practical situation, such a comparison is not available and the only audible cues are those in the reproduced signal itself. For example, it has been found that balanced steady-state low- and high-frequency response effects tend to be overlooked by the ear. Thus, to the extent that the spectral recording roll-off of the low- and high-frequency regions of the encoded signal is not restored by A-type or non-decoded playback, the balanced or symmetrical effect on the frequency spectrum makes the resulting playback acceptable to most listeners.
In cases where dynamic stability is not achieved and there are low level dynamic effects in the presence of primary signals, it has been found that the most subjectively uncomfortable audible effects are those resulting from signal deficiencies rather than excesses in the portion of the reproduced signal suffering such low level dynamic effects. Such signal deficiencies are often referred to as a "suck-out" effect. Under such conditions level variations or pumping that causes low level signals to drop further in level, as from an audible to an inaudible level, are particularly disturbing to the ear. Thus, a generalized statement of an underlying principle of the present invention is that if the encoder always provides at least as much or a surplus of signal at each frequency when the signal is decoded as in the original signal before encoding, the ear tends to be satisfied. This principle may be referred to as the principle of signal sufficiency. In other words, if there is any decode error, it should be positive so as to provide an excess of signal; the ear is more tolerant of an excess rather than a deficiency of signal.
The encoding characteristics of the spectral recording system provide an excellent starting point for generating an encoded signal that meets the requirements of the principle of signal sufficiency. This is because spectral recording provides highly frequency selective compression during encoding: the compressor tends to keep all signal components fully boosted at all times; when the boosting must be cut back at a particular frequency, reduction in boost essentially is not extended to low-level signal components at other frequencies. The audible effect of this type of compression is that the signal appears to be enhanced and brighter but without any apparent dynamic compression effects (the ear detects dynamic action primarily by the effect of a gain change due to a signal component at one frequency on a signal component at some other frequency, somewhat removed). As a consequence, spectral recording encoded signals reproduced with no special decoding whatsoever are free of pumping effects for nearly all signal conditions because of the compressor's frequency adaptiveness (dynamic action occurs substantially only at frequencies requiring such action and nowhere else) and thus are discernible to a critical listener as compressed signals only because there are changes in the low frequency and high frequency emphasis (a maximum of 16 dB compression at low frequencies and 24 dB at high frequencies).
In accordance with the underlying principle of this invention, it has been found that signal deficiencies in the order of a few decibels, say 2, 3, or 4 dB of low level signals, such as low level ambiences, in the presence of primary or dominant signals, are audibly acceptable to most listeners, but that larger deficiencies, in the order of 6 or 12 dB are not acceptable to most listeners. In contrast, it has been found that signal surpluses of 10, 12, 15 dB or even more of such low level signals in the presence of primary signals are generally acceptable to most listeners. Thus, in accordance with the principle of signal sufficiency, the playback arrangement may be considered to be compatible with an encoder if at any frequency or time the low level signals or ambiences in the presence of primary signals in the reproduced signal are no less than a few dB below the original signal and are no more than some 10 to 15 dB above the original signal. This question of compatibility relates to low level signals in the presence of primary or dominant signals because it is such low level signals that are primarily manipulated by the A-type and spectral recording systems. In those systems high level primary or dominant signals are substantially unaffected.
If a spectral recording encoded signal is played back using an A-type decoder the results conform generally to the above stated principle except for certain signal conditions that cause audible pumping and/or signal suck out in the 80 Hz to 3 kHz A-type band 2.
Consider, for example, the application to a spectral recording encoder of input signals including signals in the 80 Hz to 3 kHz range. For low level signals below its threshold, the spectral recording encoder (compressor) provides 16 dB of boost at low frequencies and 24 dB of boost at high frequencies. Boosting of low level signals narrows the dynamic range, resulting in signal compression. The amount of signal compression is reduced in accordance with the spectral recording encoder's compression law in a relatively narrow frequency range at ad near the frequency of signals exceeding the encoder threshold. The highly frequency selective or adaptive nature of the spectral recording encoder assures that the encoder's dynamic action is restricted to a relatively narrow frequency range in which dynamic action is required due to the presence of signals above threshold in that frequency range.
With respect to A-type playback, for low level signals below its threshold, the A-type decoder (expander) in band 2 provides about 10 dB of signal reduction uniformly across the 80 Hz to 3 kHz band. Reducing the level of low level signals widens the dynamic range, resulting in signal expansion. The amount of signal expansion throughout band 2 is reduced in accordance with the A-type decoder's expansion law in response to the presence of signals above threshold anywhere within band 2. Since band 2 is relatively wide, it is likely that signals above threshold located in one part of band 2 that control the dynamic expansion action will cause audible pumping of other low level signals and noise contained with the original signal in the frequency range of band 2 prior to encoding, as the gain throughout band 2 varies uniformly across the band. Such pumping, in the direction of signal suck out, is likely to be audible for certain signal conditions because band 2 is so wide in frequency that the signal controlling the dynamic action cannot effectively mask the modulation of other signals and noise in band 2 as the gain of the entire band varies.
As mentioned above, many A-type sound track encoded motion pictures have been released in single inventory, despite the fact that not all theaters have A-type decoding equipment. Such non-equipped theaters typically employ sound systems designed to play films conforming to the so-called "Academy" monophonic (mono) format, developed in the 1930's. As is well known, a considerable amount of treble cut is applied when optical sound tracks are played back in such theaters. This high frequency roll-off, referred to as the Academy characteristic, produces an attenuation of at least 20 dB at 9 kHz. Subjectively, the roll-off provided by the Academy characteristic brings the tonal character of a A-type encoded track essentially back to normal. The low level boosting of high frequency components in the encoding process adequately compensates for the high frequency Academy roll-off. Low frequency, low level signals will be left in the boosted condition, but this effect is noticeable only when a track is switched directly from A-type decoding to Academy replay. Consequently, A-type encoded films sound acceptable to most listeners when played in "Academy mono" theaters and film-makers often make the judgment that a single inventory release is appropriate.
Spectral recording encoded films, although having a greater amount of compression than A-type encoded films, also sound acceptable to most listeners when played in "Academy mono" theaters. In fact, the greater compression may be beneficial in the environment of most "Academy mono" theaters in that such theaters tend to have high ambient noise levels from noisy air conditioning systems and/or are part of a multi-screen theater complex in which sound from adjacent auditoria is audible. In addition, the highly frequency selective encoding in the spectral recording compressor results in less likelihood of audible pumping than with an A-type encoded sound track.