The invention relates generally to sound reproduction. More specifically, the invention relates to multiple channel sound reproduction systems having improved listener perceived characteristics.
Multiple channel sound reproduction systems which include a surround-sound channel (often referred to in the past as an "ambience" or "special-effects" channel) in addition to left and right (and optimally, center) sound channels are now relatively common in motion picture theaters and are becoming more and more common in the homes of consumers. A driving force behind the proliferation of such systems in consumers' homes is the widespread availability of surround-sound home video software, mainly surround-sound motion pictures (movies) made for theatrical release and subsequently transferred to home video media (e.g., videocassettes, video-discs, and broadcast and cable television).
When a motion picture is transferred from film to home video media, the soundtrack of the motion picture film is transferred essentially unaltered: the soundtrack on the home video medium is essentially an exact duplicate of the soundtrack on the film. Where reference is made below to playing a motion picture soundtrack in the home, it is to be understood that what is actually played in the home is some form of home video medium onto which the motion picture soundtrack has been transferred in an essentially unaltered form.
Although home video media have two-channel stereophonic soundtracks, those two channels carry, by means of amplitude and phase matrix encoding, four channels of sound information--left, center, right, and surround, usually identical to the two-channel stereophonic motion-picture soundtracks from which the home video soundtracks are derived. As is also done in the motion picture theater, the left, center, right, and surround channels are decoded and recovered by consumers with a matrix decoder, usually referred to as a "surround-sound" decoder. In the home environment, the decoder is usually incorporated in or is an accessory to a videocassette player, videodisc player, or television set/video monitor.
Motion picture theaters equipped for surround sound typically have at least three sets of loudspeakers, located appropriately for reproduction of the left, center, and right channels, at the front of the theater auditorium, behind the screen. The surround channel is usually applied to a multiplicity of speakers located other than at the front of the theater auditorium.
It is the recommended and common practice in the industry to align the sound system of large auditoriums, particularly a motion picture theater's loudspeaker-room response, to a standardized frequency response curve or "house curve." The current standardized house curve for movie theaters is a recommendation of the International Standards Organization designated as curve X of ISO 2969-1977(E), commonly known as the X-curve.
The X-curve is a curve having a significant high-frequency rolloff. The curve is the result of subjective listening tests conducted in large (theater-sized) auditoriums. A basic rationale for such a curve is given by Robert B. Schulein in his article In Situ Measurement and Equalization of Sound Reproduction Systems, J. Audio Eng. Soc., April 1975, Vol. 23, No. 3, pp. 178-186. Schulein explains that the requirement for high-frequency rolloff is apparently due to the free field (i.e., direct) to diffuse (i.e., reflected or reverberant) sound field diffraction effects of the human head and ears. A distant loudspeaker in a large listening room is perceived by listeners as having greater high frequency output (i.e., to sound brighter) than a closer loudspeaker aligned to measure the same response. This appears to be a result of the substantial diffuse field to free field ratio generated by the distant loudspeaker; a loudspeaker close to a listener generates such a small diffuse to direct sound ratio as to be insignificant.
More recently the rationale has been carried further by Gunther Theile (On the standardization of the Frequency Response of High-Quality Studio Headphones, J. Audio Eng. Soc., December 1986, Vol. 34, No. 12, pp. 956-969) who hypothesized that perceptions of loudness and tone color (timbre) are not completely determined by sound pressure and spectrum in the auditory canal. Theile relates this hypothesis to the "source location effect" or "sound level loudness divergence" ("SLD") which occurs whenever auditory events with differing locations are compared: a nearer loudspeaker requires more sound level (sound pressure) at the ear drums to cause the same perceived sound loudness as a more distant loudspeaker and the effect is frequency dependent.
It has also been recognized that the sound pressure level in a free (direct) field exceeds that in a diffuse field for equal loudness. A standard equalization, currently embodied in ISO 454-1975 (E) of the International Standards Organization, is intended to compensate for the differences in perceived loudness and, by extension, timbre due to frequency response changes between such sound fields.
Perceived sound loudness and timbre thus depends not only on the location at which sound fields are generated with respect to the listener but also on the relative diffuse (reflected or reverberant) field component to free (direct) field component ratio of the sound field at the listener.
The use of the standardized X-curve in motion picture theatres is significant because in the final steps of mixing motion picture soundtracks, the soundtracks are almost always monitored in large (theater-sized) auditoriums ("mixing" and "dubbing" theaters) whose loudspeaker-room responses have been aligned to the standardized response curve. This is done, of course, with the expectation that such motion picture films will be played in large (theater-sized) auditoriums that have been aligned to the same standardized response curve. Aligning both the sound system of the dubbing theatre and the sound system of the public motion picture theatre to the X-curve ensures that a film sounds in the public theatre very similar to the way it sounded in the dubbing theatre, and, in particular, that the timbre of the film sounds neutral (i.e., neither overly bright nor overly dull) in both the dubbing theatre and in the public motion picture theatre.
Although aligning theatre sound systems to the X-curve enables films to sound have a neutral timbre in both the dubbing theatre and the public motion picture theatre, it does not necessarily allow a film to have the same neutral timbre when transferred to another medium, such as a home video tape or disk. This is because the X-curve overcorrects the tendency of a loudspeaker to sound bright in a large room. A large room loudspeaker system aligned to the X-curve therefore sounds dull. Thus, when dubbing the film sound track in a dubbing theatre aligned to the X-curve, the mixing engineer will boost the level of the high-frequency parts of the program material to compensate for the dulling effect of the X-curve aligned dubbing theatre (and also the X-curve aligned public motion picture theatre) so that the timbre of the program material sounds neutral as heard by the mixing engineer in the dubbing theatre. Consequently, motion picture soundtracks inherently carry a built-in high-frequency frequency response boost that takes into account or compensates for playback in large (theater-sized) auditoriums whose loudspeaker-room responses are aligned to the standardized curve.
The loudspeaker arrangement in a typical domestic surround sound system mimics that of the motion picture theatre. The outputs of the surround-sound decoder are fed, via suitable power amplifiers, to normal domestic loudspeakers arranged one to the left and one to the right of the video monitor, and to at least two normal domestic loudspeakers arranged behind or to the sides of the main listening/viewing area. Additionally, a center channel signal may be fed to a center channel loudspeaker arranged above or below the video monitor. Although standard in motion picture theater environments, the center loudspeaker is often omitted in home systems. A phantom center sound image is created by feeding the center channel signal equally to the left and right loudspeakers.
One major difference between the home listening environment and the motion picture theater listening environment is in the relative sizes of the rooms--the typical home living room, of course, being much smaller than the typical motion picture theatre. This size difference means that a typical loudspeaker does not sound overly bright in a home living-room sized room. Consequently, there is no need to apply the high-frequency rolloff X-curve applicable to large auditoriums to the considerably smaller home living room sized room because of the above-mentioned effects.
Recorded consumer sound media (e.g., vinyl phonograph records, cassette tapes, compact discs, etc.) are monitored when they are made in relatively small (home living room sized) monitoring studios using loudspeakers which are the same or similar to those typically used in homes. In particular, the sound systems used in the mixdown rooms of music recording studios sound relatively neutral, and do not sound dull like the sound systems in film dubbing theatres. Relative to the room-loudspeaker systems in theatres, the response of a typical modern home room-loudspeaker system or a small studio room-loudspeaker system can be characterized as substantially neutral, particularly in the high-frequency region in which the X-curve applies excessive rolloff in the large auditorium. A consequence of this is that motion pictures transferred to home video media have too much high frequency sound when reproduced by a home system. Consequently, the musical portions of motion picture soundtracks played on home systems tend to sound "bright." In addition, other undesirable results occur--"Foley" sound effects, such as the rustling of clothing, etc., which tend to have substantial high-frequency content, are over-emphasized. Also, the increased high-frequency output when motion picture soundtracks are played on home systems often reveals details in the makeup of the soundtrack that are not intended to be heard by listeners; for example, changes in soundtrack noise level as dialogue tracks are cut in and out. These same problems, of course, occur when a motion picture soundtrack is played back in any small listening environment having consumer-type loudspeakers, such as small monitoring studios.
It should also be understood that the above remarks regarding motion picture soundtracks generally do not apply to the soundtracks of motion pictures originating in the music industry, for example, music videos. The music industry usually mixes its motion picture soundtracks in small, home-sized, studios, so that its soundtracks do not have the timbre errors of soundtracks originating in the film industry.
Also, in both home and theater systems, including the above-mentioned high-quality theater sound systems, no compensation has been employed for the differences in listener-perceived timbre between the main channels and the surround channel. For example, sounds which move from the main channels to the surround channel or vice-versa (sounds "panned" off or onto the viewing screen) undergo timbral shifts. Such shifts in timbre can be so severe as to harm the ability of the listener to believe that the sound is coming from the same sound source as the sound is panned.
The inventor has discovered that the above mentioned equalization standard, currently embodied in ISO 454-1975 (E) of the International Standards Organization, which is a measure of the timbre difference between a direct sound field and a diffuse sound field, cannot be used as a basis to compensate properly for the listener-perceived timbre differences between the main and surrounds channels.
The inventor believes that there are two main causes for the listener--perceived timbral shift between the main and surround channels. The first is timbre changes due to comb filtering. Comb filtering may arise from the operation of multiple surround loudspeakers or from deliberately added electronic comb filters used to simulate a surround array with only two loudspeakers. The second cause is frequency response differences due to the human head related transfer function (i.e., the difference between the frequency response measured by a microphone along and the frequency response measured by a microphone at the bottom of the ear canal, close to the eardrum; the difference being caused by the presence of the head in the sound field and the effects of the pinna and the ear canal). In addition, the difference in character between the direct sound field generated by the main channel loudspeakers and the diffuse sound field generated by the surround channel loudspeakers may be an additional factor.