When enjoying audiovisual media a listener may find himself or herself sitting closer to the audiovisual media device, either literally or in a psychological sense, than was the norm in connection with traditional audiovisual media systems. Referring to FIG. 1, in a traditional audiovisual media scenario, a listener 10 is sitting a distance d away from a visual media screen 12, which may be a television screen or a movie theater screen. One or more audio speakers 14 produce sound to accompany the display on visual media screen 12. By way of example, some of the sound produced by speakers 14 may consist of the speech of actors in the foreground while other sounds may represent background sounds far in the distance.
There are various cues that can naturally occur in the recorded sound to convey to listener 10 a sense of how near or far the sound source is to the listener 10. For example, speech recorded close to a microphone in a room will ordinarily tend to have less reverberation from the room than speech recorded farther away from the microphone in a room. Also, sounds occurring at a distance will tend to be “muffled” by attenuation of higher frequencies. The listener 10 psychoacoustically factors in the perceived distance between the listener 10 and the objects portrayed on visual media screen 12 when listening to these cues in the recorded media reproduced by audio speakers 14. This perceived (or apparent) distance between listener 10 and the objects portrayed on visual media screen 12 is both a function of the techniques which went into producing the video and audio tracks, and the playback environment of the listener 10. The difference between 2D and 3D video and differences in audio reproduction systems and acoustic listening environment can have a significant effect on the perceived location and perceived distance between the listener 10 and the object on the visual media screen 12.
Consumers seeking to enjoy audiovisual media are faced with selecting between a wide range of formats and a variety of devices. With increasing frequency, for example, consumers watch audiovisual media on computers or laptops, where the actual distance d′ between listener 10 on the one hand and visual media screen 12 and audio speakers 14 on the other hand is drastically reduced, as is illustrated in FIG. 2. Even in the context of television viewing, the dimensions of home theater visual media screens have been increasing, while the same content is increasingly being enjoyed on vastly smaller mobile handheld screens and headphones.
Movie theaters have employed increasingly sophisticated multichannel audio systems that, by their very nature, help create the feel of the moviegoer being in the midst of the action rather than observing from a distance. 3D movies and 3D home video systems also, by their nature, create the same effect of the viewer being in the midst of the field of view, and in certain 3D audio-visual systems it is even possible to change the parallax setting of the 3D audio-visual system to accommodate the actual location of the viewer relative to the visual media screen. Often a single audio soundtrack mix must serve for various video release formats: 2D, 3D, theatrical release, and large and small format home theatre screens. The result can be a mismatch between the apparent depth of the visual and audio scenes, and a mismatch in the sonic and visual location of objects in the scene, leading to a less realistic experience for the viewer.
It is known in the context of stereo sound systems that the perceived width of the apparent sound field produced by stereo speakers can be modified by converting the stereo signal into a Mid/Side (or “M/S”) representation, scaling the mid channel, M, and the side channel, S, by different factors, and re-converting the signal back into a Left/Right (“L/R”) representation. The L/R representation is a two-channel representation containing a left channel (“L”) and a right channel (“R”). The M/S representation is also a two-channel representation but contains a mid channel and a side channel. The mid channel is the sum of the left and rights channels, or M=(L+R)/2. The side channel is the difference of the left and right channels, or S=(L−R)/2).
By changing the ratio of M versus S, it is possible to cause the reconstructed stereo signal to appear to have a wider or narrower stereo image. Nevertheless, a listener's overall perception of the dynamic range of depth is not purely dependent on the relationship between L and R signals, and stereo versus mono sound is not itself a spatial depth parameter. In general, the dynamic range is a ratio between the largest and smallest values in an audio signal. Moreover, the perceived loudness of an audio signal can be compressed or expanded by applying a non-linear gain function to the signal. This is commonly known as “companding” and allows a signal having large dynamic range to be reduced (“compression”) and then expand back to its original dynamic range (“expansion”). Nevertheless, perceived depth of an auditory scene or object is not purely dependent on the loudness of the audio signal.
The different formats and devices that consumers use for playback can cause the listener's perceived audible and visual location of objects on the visual media screen 12 to become misaligned, thereby detracting from the listener's experience. For example, the range of visual depth between on object on the visual media screen 12 can be quite different when played back in a 3D format as compared to a 2D format. This means that the listener 10 may perceive a person to be a certain distance away based on audio cues but may perceive that person to be a different distance away based on visual cues. In this case the listener's perceived distance to an object displayed on the visual media screen 12 is different based on audio cues than based on visual cues. In other words, the object may sound closer than it appears, or vice versa.