1. Technical Field
This invention relates generally to the reproduction of stereophonic sound, and, more particularly to the reproduction of stereophonic sound associated with a video image so that dialog is localized to the video image and ambience and sound effects are reproduced in a manner that immerses the listener in a realistic, three-dimensional sound field.
2. Discussion
In the past, numerous monophonic and stereophonic sound systems have been developed in an attempt to achieve high fidelity sound reproduction. Initial efforts restricted the concept of high fidelity to reproducing monophonic audio signals. These early efforts focused on producing a speaker enclosure meeting performance criteria defined by measurable acoustic characteristics such as frequency response, distortion, and dynamic range. The speakers included an enclosure containing one or a number of acoustic transducers and crossover networks intended to reproduce the full frequency range of audibility. As an example of such a multiple transducer and crossover configuration, a three-way speaker design includes a woofer transducer to reproduce low frequencies, a mid-range transducer to reproduce middle frequencies, and a tweeter transducer to reproduce high frequencies.
The typical crossover network described above blends the acoustic output of speaker transducers to achieve good tonal balance characterized by a smooth transition in acoustic output from one transducer to another. One way to accomplish this is a symmetrical crossover network that function as a filter to assure the response drop-off of one transducer as frequency increases through the transition region is a mirror image of the response increase of a companion transducer reproducing the adjacent higher frequency band of sound. Proper implementation of this design approach requires that the combination of transducers and crossover networks do not introduce audible artifact (an unnatural sound quality) resulting from frequency response irregularities or phase cancellation effects that potentially result from housing a multiplicity of transducers in one speaker enclosure.
The early attempts at high fidelity through monophonic audio signals and three way crossover networks eventually gave way to stereophonic sound reproduction. Early stereophonic systems employed a pair of identical, spatially distributed high-fidelity speakers to reproduce two-channels of audio signal. This spatial distribution of two speaker enclosures is fundamental to the concept of stereo sound reproduction. A stereo image results when the acoustic output from the pair of speakers fuses into a stereo image perceived as a horizontal panorama of sound. This panorama of sound creates for the listener a stereo sound image that spans the space between the two speaker locations. A proper stereo perspective results for a listener positioned along an axis between the two speakers and perpendicular to the plane of the speakers.
Most speakers employed in stereophonic systems project sound in a direct path from the speaker to the listener, referred to as direct-radiation. In an attempt to broaden the stereo image, designers have employed speaker pairs which radiate a combination of direct and reflected sound. Such a configuration expands the stereo image beyond the space between the two speakers.
Some more contemporary stereophonic sound system designs utilize three-piece sub-satellite speaker systems in which a combination of a sub-woofer bass unit and a pair of satellite speakers replaces the pair of conventional full-range speaker enclosures described above. In such three-piece speaker systems, the satellite speakers reproduce a broad spectrum of mid and high frequency sounds, while the bass unit reproduces only very low frequency sounds. Restricting bass reproduction to the sub woofer unit allows the satellite speakers to be of relatively small size compared to traditionally large stereo speaker boxes, whose large size is dictated by the large transducers and enclosures needed to achieve good bass response. Many consumers prefer this smaller satellite speaker arrangement over the more traditional pair of full-range speaker enclosures. The bass unit can be placed out of sight, and the satellite speakers are more easily blended in with the room decor. However, other consumers still view these somewhat smaller satellite speaker boxes as unsightly and difficult to incorporate in the home setting in an unobtrusive manner.
Despite the improvements in the overall sound quality provided by even the most sophisticated systems, whether a pair of stereo speakers or a three-piece sub-satellite system, many consumers believe contemporary sound systems lack the sense of sonic realism associated with live sound. Each sound reproduction system, while meeting quantitative acoustic performance criteria relative to frequency response, distortion, and dynamic range, can subjectively evoke a wide range of listener perceptions of sonic realism from a qualitative point of view. Some systems determined to sound more realistic have also been found to create a sense of spaciousness in the reproduced sound. This determination has provided the basis for extensive developments in the field of acoustics in order to achieve an enhanced spatial quality to reproduced sound, while avoiding the introduction of sonic artifact that would detract from the overall sonic experience.
The three-piece sub-satellite speaker system described above extends the concept of spatially distributing speaker components such as a stereo pair of speakers. The concept can be yet further extended by spatially distributing a substantial number of point sources for reproducing sound in a listening environment to further increase the perceived spaciousness. While adding a multiplicity of spatially distributed point sources of sound can increase the perception of spaciousness, it also can produce an exaggerated, overblown spatial presentation that lacks realism. Such unnatural sound reproduction often causes the listener to experience acoustic fatigue. Thus, enhanced spaciousness must balance with the perceived acoustic realism of the resulting sound field in order to completely satisfy the listener.
This balance is particularly important in home theater sound systems where the acoustic requirements for this application differ from those for sound reproduction of stereo music. The key objectives for a home-theater sound system are to (1) establish a convincing surround sound acoustic atmosphere based on ambience and sound effect audio signals captured in the soundtrack; (2) maintain a stereo image panorama of sound in front of the viewer; and (3) reproduce dialog that remains localized to the video screen for all viewers in the room. In essence, satisfactory acoustic performance results when the listener is immersed in a sound field having a three-dimensional spatial quality perceived as authentic in relation to the visual presentation on the video screen.
Initial attempts to produce home theater sound included placing a pair of traditional speakers on either side of a centrally located video display. Such systems improved upon the sound of speakers included within the typical television set. However, the performance of such systems was determined to be unacceptable in the marketplace for at least two reasons. First, listeners located off the center line between the two speakers will not localize dialog to the screen (i.e., perceive the dialog to be solely coming from the screen). Dialog is typically recorded equally in both the left and right channels signals. Localization of dialog will be a point equidistant between the two speakers for a listener on the centerline between the speakers. As a listener moves off the center line, he will move closer to one speaker and farther away from the other. Localization of dialog will shift to the direction from which the first arriving signal originates. This will be the closest speaker. Dialog collapses to the near speaker as a listener moves off axis. The localization of dialog will be displaced from the location of the video image for off axis listeners, and the illusion that the characters on screen are actually speaking for off axis listeners will be destroyed. Second, a pair of stereo speakers located on either side of the visual display confines the sound field to the space in front of the listener, in the plane of the speakers. There is, thus, no sense of immersion--a sense that sound events occur to the side or behind the listener as well as in front of the listener.
Many systems have been designed in an attempt to remedy these deficiencies. For example, U.S. Pat. No. 3,697,692, issued to Hafler, discusses using ambience-recovery technology. Hafler utilized the fact that surround sound information resides in virtually all stereo audio signals, whether music recordings or the soundtrack of video program material, and can be recovered. Recovery results from obtaining the difference signal between the left and right channel (L-R) leaving substantially only the ambience portion of the signal. This left minus right (L-R) difference signal reproduced by speakers placed in the rear of the listening room provides the recovered surround sound information.
Another alternative early home-theater sound system added an additional center channel to reproduce a left plus right (L+R) sum signal to improve the quality of dialog sound reproduction. The center channel was combined with rear surround speakers that reproduce a left minus right (L-R) difference signal, similar to the ambience recovery speakers described above. An example of such a system has been developed by Dolby Laboratories under the name DOLBY SURROUND.
The center speaker for reproducing the (L+R) signal, as embodied in DOLBY SURROUND systems, improved upon the desirable localization effect of dialog for off-axis listeners. However, the (L+R) center channel reproduction did not completely solve the problem of displacement between the auditory and visual images for off axis listeners. Those systems still suffer from localization errors for dialog (and other signals encoded in the sum signal) because passive decoding schemes such as DOLBY SURROUND are only capable of achieving a maximum adjacent channel separation of 3 dB (where adjacent channels are defined as center and right, center and left, left and surround, right and surround). A 3 dB difference in level between dialog in the center channel and dialog in the left and right channels is not sufficient to confine localization to the location of the center channel speaker for all listening positions throughout a typical listening room. Localization still shifts to the near speaker for off axis listeners. Having dialog collapse to the near speaker is common to all prior art passive decoder systems.
In an alternative approach to DOLBY SURROUND systems, a T-configuration arrangement proposed by U.S. Pat. No. 4,612,663, issued to Holbrook, provides surround sound by passively decoding the stereo signals. The T-configuration includes left and right speakers reproducing the respective left and right signals, a third speaker reproducing the difference (L-R) signal positioned midway between and in the plane of the left and right speakers, and a fourth speaker reproducing the difference signal positioned behind the listener. However, this approach fails to maintain a rational sonic image in situations where the stereo signal temporarily has predominantly left or right channel energy and also fails to prevent the perception of dialog emanating from the near left or near right speaker.
Another system using (L-R) and (R-L) difference signals may be found in U.S. Pat. No. 5,027,403, issued to Short et al. Short discusses using forward facing left and right channels to provide sound output in the direction of the listener. Short also discusses directing (L+R) bass signals rearwardly from the general plane of the video viewing area. Short further discusses directing (L-R) and (R-L) signals rearwardly or sidewardly from the general vicinity of the video image. However, Short suffers from the disadvantage that all sounds emanating from the speakers emanate from the video image. Such substantially planer sound radiation does not fully provide the ambience and surround sound effect.
Another example of a system having speakers arranged in a generally planer configuration can be found in U.S. Pat. No. 4,497,064, issued to Polk. Polk also discusses arranging main left and right speakers and additional sub-speakers, disposed in proximity to the main speakers, to provide the listener with an expanded acoustic image during stereophonic sound reproduction. However, Polk maintains specific, limiting system requirements, including that the speakers be equidistant from the listener in order to assure the arrival of sound at the listener within a predetermined time period. Polk further discusses high pass filtering an inverted version of a main speaker signal for output from the opposite side sub-speaker. The high pass filtering cancels the opposite side main speaker component which would otherwise reach the ear of the listener on the side which is filtered. However, the high pass filters are not directed to cancelling low frequency components to maintain localization of voice information to a video image. Polk also specifically requires that all system speakers remain located in substantially the same plane and radiate in the direction of the listener. The system of Polk will also not be able to maintain localization of program material equally recorded in the left and right channels to the area centered between the two speakers for off axis listeners. Localization of such signals will shift toward the near speaker for off axis listeners.
Examples of non-planer speaker configurations include U.S Pat. No. 4,443,889, issued to Norgaard. Norgaard discusses the use of a left front speaker and a right front speaker to reproduce the respective left and right channel stereo signals. Norgaard also discusses the use of a (L-R) difference signal through a rear speaker to create an ambience signal. However, among other things Norgaard does not consider combining a (L+R) summation signal through a center speaker to better localize dialog to the video image.
U.S. Pat. No. 5,181,247, issued to Holl discusses similar concepts regarding the use of (L-R) and (R-L) difference signals. However, Holl does not teach the use of a single speaker to output a (L+R) summation signal. Nor does Holl suggest bandlimiting the signal input to the ambience speakers.
U.S. Pat. No. 4,819,269, issued to Klayman, discusses radiating sound based on a summation signal in a limited dispersion pattern and radiating sound based on a difference signal in a wide dispersion pattern. The radiated signals combine acoustically with the intent of improving the stereo sound in the listening area. However, Klayman specifically requires specialized, wide dispersion horns or arrays of multiple transducers to achieve the desired effect described. Further, Klayman does not discuss excluding the primary frequency range of vocal energy from the output of any of the speakers to better localize dialog to the center speaker.
Other surround sound type systems use complex signal processing in an attempt to improve the apparent separation between each of the left, center, right, and surround channels. The most common system of this type in use today is the DOLBY PRO-LOGIC decoding system. This system improves upon solutions to the basic problems of many prior art passive decoding systems previously described. Active electronic circuits are used to decode matrix-encoded audio signals, introduce time delays, and accomplish steering between channels through auto-gain control circuitry. However, the improved performance requires a substantially greater expense because DOLBY PRO-LOGIC requires a minimum of four separate amplification channels.
Further, by their very nature, active electronic signal processing systems potentially introduce sonic artifact (an unnatural sound quality that can destroy the sense of realism) in their response. One such form of artifact in the DOLBY PRO-LOGIC system results from the active steering circuits that vary the amount of adjacent channel signal subtracted from a signal. For example, when dialog is present and it is desired for it to be localized to the center, the center channel signal is subtracted from the left and right channel signals to remove dialog energy from these channels. This variable subtraction is dynamically varying channel separation to maintain primary localization in a particular direction. Listeners frequently can hear the ambience (which creates atmosphere in the audio-video presentation) come and go as dialog enters or leaves the scene. The shrinking down and growing back of the ambience that accompanies the introduction and cessation of dialog distracts the listener and proves to be a clear disadvantage of this particular active electronics approach to home-theater sound reproduction.
Another drawback to the DOLBY PRO-LOGIC is that it only works properly with encoded program material. Unencoded material, or material that has been degraded in some way can confuse the logic circuits and cause strange, extreme spatial effects to occur when the decoder steers localization in a way that was not intended. Another major disadvantage of the active DOLBY PRO-LOGIC decoding system includes its high cost to the consumer and its inherent complexity that makes it difficult for the consumer to install and use the system properly.
More recently, there has been a return to attempt to provide less complex, inexpensive, passive surround sound systems. An example of such systems is described in U.S. Pat. No. 5,386,473, issued to Harrison. Harrison is directed to the use of a transformer that passively decodes line level stereo television output signals that require further amplification to produce the high level signal necessary to drive speakers. The transformer receives input left and right channel signals and provides left front, right front, left rear (L-R), right rear (R-L), center (L+R), and sub-woofer channels. Harrison resorts to transforming low level signals specifically to solve perceived problems resulting from the use of speakers connected to high level amplifier outputs to obtain a surround sound effect. However, Harrison cites disadvantages in operating a passive surround sound system satisfactorily on high level signals. The present invention is directed specifically to using high level signals to provide surround sound while alleviating the problems mentioned regarding high level systems discussed in Harrison, such as the expense of high-powered components, balance problems, and the like.
Other recent attempts at passive decoding include the QD-1 Series II decoder manufactured by Dynaco. The QD-1 Series II decoder receives signals from the stereo amplifier. The decoder then produces four (or five) signals--two front speakers, two rear speakers, and an optional center channel speaker. A second, similar decoder is the HTS-1 Decoder manufactured by Chase Technologies. Similar to QD-1, the Chase Decoder receives signals from the amplifier and then generates signals for a pair of front and a pair of rear speakers. The Chase Decoder also produces a signal for an optional, amplified center channel speaker.
These latter two passive decoders suffer from two primary disadvantages. First, the resistor network used to produce a (L+R) signal for the center channel dissipates energy thus requiring a stereo amplifier or receiver of sufficiently high power to overcome this energy loss. It is preferable to provide a system in which all speakers of the system are driven by a relatively low-power amplifier, such as is found in a television or a portable boom-box wherein no power is wasted in signal summing resistor networks. In one of the previous systems, the center channel speaker must be powered in order to generate the desired function of maintaining dialog localization at the physical location of the center speaker. Second, because a certain amount of (L+R) signal is fed to the rear surround speakers, artifact can occur in terms of dialog emanating from the rear surround speaker thus disturbing the realism of the intended ambience effect.
Thus, there remains a need for a home theater surround sound speaker system which operates using relatively simple, passive electronics in order to limit its cost and thus provide a system having mass market appeal at a reasonable cost. Of particular importance in these systems is the desirability that they present a consistent ambient sound field while maintaining dialog localized to the video image for all positions in the listening and viewing area. The dialog and visual images also preferably coincide at the video image and preferably are not displaced from each other in a direction of a particular speaker.
Further, audio designers have paid substantial and particular attention to designing speaker systems which reproduce left and right channel audio signals of a stereophonic signal to create a three-dimensional surround sound sonic effect. However, audio designers have largely ignored the monophonic sound market. Many consumers still have monophonic television sets which output only a single monophonic channel, rather than left and right channel components of a stereo signal. This presently relegates the consumer owning a monophonic television to having sound emanate solely from the television set location. In addition, while AM stereo continues to be discussed and may be employed by a few limited stations, the majority of AM broadcasts continue to be monophonic. Finally, many programs available on television, VCR, cable, satellite, and other stereo audio/video signal delivery systems have monophonic soundtracks.
Some stereo and home theater audio/visual receivers apply signal processing techniques to the monophonic sound signal to produce simulated stereo or an enhanced spatial sound effect. Such signal processing typically involves additional and complex phase shift, filtering, and digital signal processing circuitry. The consumer thus must absorb the expense of purchasing such a receiver, a surround sound decoder, or other sound processing electronic device and a suitable network of speakers to achieve a simulated stereo or three-dimensional spatial effect from a monophonic audio signal. Therefore, there exists a need to provide a low-cost, system for effectively reproducing monophonic audio signals in a manner that creates a convincing three-dimensional sonic effect.
In addition to the obvious desirability of a home theater surround sound system which provides all of the above-described benefits, a more practical logistical problem exists in home theater systems. Namely, as home theater systems continue to evolve, they typically require an ever increasing number of additional components. Such components often include active electronic controllers, numerous speakers connections, ancillary control modules, and separate audio system interconnects. This morass of components often confuses the average consumer during installation. Despite numerous attempts by manufacturers to make installation more user-friendly and to facilitate the installation procedure, many users experience difficulties in properly installing the system. The most recent attempts to facilitate the installation process have involved color coding the connections at the speaker and at the audio signal source in addition to labeling the connection jacks for the user to view, and have provided detailed and complete installation instructions. For many reasons, these measures have failed to provide the consumer with a sufficiently easy way to install home theater sound system correctly, and many consumers are faced with the expense of a professional installation.
Thus, it is further desirable to provide a home theater surround sound system which greatly facilitates installation so that the consumer may relatively quickly, easily, and correctly install and operate the system, thus, enhancing mass market appeal.