1. Field of the Invention
The disclosed invention relates to systems for stereo sound reproduction, and is particularly directed to systems that synthesize pseudo-stereophonic output signals from a monophonic input signal.
2. Description of the Related Art
Monophonic reproduction of sound is the reproduction of sound through a single channel. When a sound source such as an orchestra is recorded and reproduced monophonically (i.e., reproduced by a single loudspeaker), much of the color and depth of the recording is lost in the reproduction. Even if the monophonic recording is reproduced through two spatially separated loudspeakers, the orchestral sounds will still appear to emanate from essentially a point somewhere between the loudspeakers.
Stereophonic reproduction occurs when the orchestra is recorded on two different sound channels by two separate microphones. Upon reproduction by a pair of loudspeakers, the orchestra does not appear to emanate from a single point between the loudspeakers, but instead appears to be distributed throughout and behind the plane of the two loudspeakers. The two-channel recording provides for the reproduction of a sound field which enables a listener to both locate various sound sources (e.g., individual instruments or voices) and to sense the acoustical character of the recording room or concert hall.
True stereophonic reproduction is characterized by two distinct qualities that distinguish it from single-channel reproduction. The first quality is the directional separation of sound sources to produce the sensation of width. The second quality is the sensation of depth and presence that it creates. The sensation of directional separation has been described as that which gives the listener the ability to judge the selective location of various sound sources, such as the position of the instruments in an orchestra. The sensation of presence, on the other hand, is the feeling that the sounds seem to emerge, not from the reproducing loudspeakers themselves, but from positions in between and usually somewhat behind the loudspeakers. The latter sensation gives the listener an impression of the size, acoustical character, and the depth of the recording location. The term xe2x80x9cambiencexe2x80x9d has been used to describe the sensation of width, depth, and presence. In other words, the term ambience is often used to describe width, depth and presence when directional separation is excluded.
Two-channel stereophonic sound reproduction preserves both qualities of directional separation and ambience. Synthesized stereophonic sound reproduction, also known as pseudo-stereophonic reproduction, typically does not attempt to recreate stereo directionality, but only the sensation of ambience that is a characteristic of true two-channel stereo.
When a two-channel stereophonic sound reproduction system is used in combination with a visual medium, such as television or motion pictures, the two qualities of directional separation and ambience create in the listener a sense of immersion in the audio-visual scene. The sensation of ambience will recreate the acoustical properties of the recording studio or location, and the directional sensation will make various sounds appear to emanate from their respective locations in the visual image. In addition, since the ambience produces the feeling that sounds are coming from positions behind the plane of the loudspeakers, a certain three-dimensional effect is also produced.
It is also possible for the synthesized stereo system to create a disturbing separation sensation in the mind of the listener if the frequency spectrum is improperly divided between the two loudspeakers. The synthesized stereo system achieves its intended effect by controlling the relative amplitudes and/or phases of the sound signals as a function of the audible frequency spectrum at: the reproducing loudspeakers. Listeners are naturally very familiar with the sound of a human voice and can easily distinguish a human voice from among a number of instruments or other background noise. Thus, it can be very disconcerting to a listener if a voice appears to wander back and forth across a soundstage. By contrast, listeners are generally less able to pick out a particular instrument from a group of instruments. Thus, it is generally less disturbing to a listener if the sound from one particular instrument appears to wander across the soundstage. Many prior art stereo synthesizers use time delays or other broadband signal processing elements to manipulate a monophonic signal to produce a pseudo-stereophonic signal in a way that adds an unnatural ambience to human voices and causes the voice to appear to wander unnaturally about the soundstage.
Embodiments of the invention solve these and other problems by using sound enhancement signal processing designed to manipulate a monophonic signal to produce a pseudo-stereophonic signal in a manner that is pleasing to the ear. The signal processing adds relatively more ambience to the musical instruments in the monophonic signal and relatively less ambience to the human voices in the monophonic signal.
More generally, the sound enhancement signal processing can be used to produce multiple output channels from a single input channel, such that the output channels have more ambience than the input channel. For example, the input channel may be a monophonic input channel, and the outputs may be amplified and used to drive left and right stereophonic loudspeakers.
One embodiment is a synthesizer which provides more output channels than input channels. In one embodiment, the synthesizer develops two or more filtered output signals from a single input signal. The input signal is applied to a perspective filter that produces a differential-mode output signal. The input signal is also applied to an equalizer filter that produces a common-mode output signal. The differential-mode and the common-mode signals are combined to produce output channels.
The two-channel synthesizer is desirably used as a stereophonic synthesizer that generates left and right pseudo-stereophonic output channels from a single monophonic input channel. The left output channel is produced by a left channel combiner, and the right output channel is produced by a right channel combiner.
The synthesizer may be constructed using analog components such as operational amplifiers (op-amps). Alternatively, the synthesizer may be implemented in software on a computer, such as, for example, a microprocessor or a Digital Signal Processor (DSP).
The synthesizer phase-equalizes the outputs such that the output channels are substantially in phase in a frequency band corresponding to human voice, including the formant frequencies of the human voice, so as to avoid unwanted ambience in the human voice while enhancing the ambience effect of other, more randomly distributed sound signals. When the synthesizer is used as a stereophonic synthesizer to generate left and right pseudo-stereophonic inputs from a monophonic input, the phase-equalization centers the human voices on a sound stage and also provides increased quality in the reproduction of speech sounds.
In accordance with one embodiment of the invention, a wider stereo sound image and listening area are achieved by generating common-mode and differential-mode signals from a monophonic input signal by selectively altering the relative amplitudes and phases of the monophonic signal frequencies and the relative amplitudes of the sum signal frequencies, and combining the common-mode and differential-mode signals to produce pseudo-stereophonic left and right channel signals.
To produce the common-mode signal, selected frequency components of the monophonic signal are boosted relative to other signal frequency components of the monophonic input signal. Moreover, selected phase components of the monophonic signal are shifted relative to other phase components of the monophonic input signal to further shape the common-mode signal. The selective boosting and phase shifting to produce the common-mode signal prevents the common-mode signal from being overwhelmed by the differential-mode signal.
To produce the differential-mode signal, selected frequency components of the monophonic signal are attenuated (de-emphasized) relative to other monophonic signal frequency components. The selective boosting to produce the differential-mode signal provides for a wider stereo image and a wider listening area. The selective emphasis or boost of the differential-mode signal components provides a wider stereo image, and the harshness and image shifting problems associated with indiscriminate increase of the differential-mode signal are substantially reduced by the equalization provided by the equalizer.
The selective emphasis or boost of selected components in the differential-mode signal further enhances the stereo image because it provides the perception of ambient sounds that are heard at a live performance but often masked in recordings. For example, a listener at a live indoor musical performance hears both the sounds that radiate directly from the instruments, sounds reflected from walls and other objects, and reverberant sounds created by the enclosed nature of an auditorium. At a live performance the ambient (e.g., reflected and reverberant sounds) are readily perceived and are not masked by the direct sounds. In a recorded performance, however, the ambient sounds are masked by the direct sounds, and are not perceived at the same level as at a live performance. The ambient sounds generally tend to be in the quieter frequencies of the difference signal, and boosting the quieter frequencies of the difference signal unmasks the ambient sounds, thereby simulating the perception of ambient sounds at a live performance.
The selective emphasis of the differential-mode signal also provides for a wider listening area for the following reasons. The louder frequency components of the differential-mode signal tend to be outside the mid-range, which includes frequencies corresponding to human voices and frequencies having wavelengths comparable to the ear-to-ear distance around the head of a listener. As a result of the selective emphasis provided by one embodiment of the invention, components at frequencies where a listener has increased phase sensitivity are not inappropriately boosted. Therefore, the stereophonic image-shifting problem resulting from indiscriminate increase of the difference signal (discussed above) is substantially reduced, and the listener is able to localize human voices on the soundstage.
In providing the selective boosting of the differential-mode signal, the amount of enhancement, which is determined by the level of the selectively boosted difference signal that is mixed, is set so that the amount of ambience provided is relatively consistent and pleasing to the ear.
Embodiments of the invention are also directed to playback of monophonic phonograph records, magnetic tapes, radio and television broadcasts, movie soundtracks, and digital discs through a conventional sound reproducing system. Embodiments of the invention are also applicable for making pseudo-stereophonic recordings on any medium, including, for example, phonograph records, digital discs or magnetic tape which recordings can be played on a conventional sound reproducing system to produce left and right stereo output signals providing the advantageous effects described above.