This invention pertains to a method and apparatus for reproducing sound from stereophonic source signals in which the reproduced sound has a greatly expanded acoustic image.
The present invention can best be understood and appreciated by setting forth a generalized discussion of the manner in which stereophonic signals originate, as well as a generalized discussion of the manner in which sound is conventionally reproduced from a stereophonic signal source.
When live music is, for example, performed the listener perceives the sounds of the instruments and performers as coming from the general direction of each instrument or performer. The sonic qualities of the acoustic environment in which the music is performed are also perceived as surrounding the listener. Conventional stereophonic recording and reproducing techniques limit the sound field to an area between two speakers thereby losing much of the stereo information.
The human auditory system localizes position through two mechanisms. Direction is perceived due to an interaural time delay or phase shift. Distance is perceived due to the time delay between an initial sound and a similar reflected sound. A third, poorly understood mechanism, causes the ear to perceive only the first of two similar sounds when separated by a very short delay. This is called the precedence effect. Through these mechanisms the listener perceives the direct sound reflected from the walls of the hall as a multitude of secondary sounds arriving from different directions and distances.
Referring to FIG. 1, there is schematically illustrated a listener P situated in a room having walls W1, W2, W3 and W4, and containing a sound source S. In addition to the direct sound path DP from source S to the listener, there are a multitude of reflected sound paths, and exemplary reflected paths are shown in FIG. 1 as RP1 through RP6. The floor and ceiling reflections are not shown for the sake of clarity, but reflected sounds arrive at the listener's ears from nearly every direction.
Being immersed in this reverberent field, the listener will perceive the direct sound from the Source, S, and will also form a subliminal impression of the size and shape of the hall where the performance is taking place based on the arrivals of the reflected sounds. Turning now to FIG. 2, there is schematically illustrated the process of normal stereophonic recording. A source S is spaced from a listener P in an environment which includes a plurality of walls W1, W2, W3. In such an environment the listener will of course perceive sounds from the source S along a direct path DP1. Also, the listener will perceive sounds reflected from the walls of the environment as illustrated in FIG. 2 by the path RP1 to a point P1 on the wall W1 and thence along path RP2 to the listener P. In a stereophonic recording, microphones ML and MR are situated in front of the source S as shown in FIG. 2. If the source S is equidistant from the microphones, then both microphones will pick up sounds from the source S along direct paths DP2 and DP3. In addition, the hall ambience information will be recorded by the left and right microphones ML and MR in addition to the direct sound from the source. This is illustrated by the reflected paths RP3 and RP4 from the point P1 on wall W1.
Turning now to FIG. 3, there is illustrated what happens when the sounds recorded by the microphones as in FIG. 2 are reproduced by loudspeakers LS and RS positioned in the same position relative to the listener P as the recording microphones. In FIG. 3 the listener P is shown as having a left ear Le and a right ear Re. If the sound recorded as in FIG. 2 was initially equidistant from the two microphones, the sound will reach each microphone at the same time. Accordingly, in reproducing the sound, a listener equidistant from the two speakers LS and RS will hear the reproduced direct sound from the left speaker in the left ear (path A) at the same time as the same sound from the right speaker is heard in the right ear (path B). The precedence effect will tend to reduce perception of interaural crosstalk paths a and b. The listener P, hearing the same sound in both ears at once will localize the sound as being directly in front of and between the speakers, as shown in FIG. 4.
Referring again for a moment to FIG. 2, consider a sound reflected from the point P1 on the wall W1 of the hall. The reflected sound from the secondary source reaches the left microphone ML first via the path RP3. This sound is delayed relative to the direct sound along path DP2, partially preserving the distance information about the reflection from P1. The sound from P1 at some time thereafter reaches the right microphone MR along path RP4 after a further delay and further reduction in loudness. In this case, the delay corresponds approximately to the distance MD between the microphones. Turning now to FIG. 5, there is illustrated what the listener P will hear with respect to both the direct and reflected sound illustrated in FIG. 2. When reproduced by the loudspeakers LS and RS the listener will first hear the direct sound from the source at the same time in both ears, corresponding to the apparent source shown in FIG. 5. The listener will then hear the delayed sound corresponding to the reflection from P1 being recorded by the left microphone and reproduced by the left speaker first in the left ear Le and then in the right ear Re. The initial delay caused by the longer path taken by the reflection in reaching the left microphone ML gives the listener an impression of the distance between the original source, P1, and himself. However, the interaural delay .DELTA.t, (corresponding to the time it takes sound to travel between a listener's ears) gives the impression that the reflected sound has come from a point behind and in the same direction as the left speaker, illustrated as the first apparent point P1 in FIG. 5. For reference, the location of the actual point P1 is also shown in FIG. 5. After a further delay, the listener will hear the reflected sound reproduced by the right speaker RS. Since the additional delay (corresponding to the distance MD in FIG. 1) is much greater than any possible interaural delay (except for the case of a very small microphone spacing) this sound will create a second apparent point P1 behind and in the same direction as the right speaker, as illustrated in FIG. 5. However, it has been observed in experiments that the listener mainly perceives the direction information of the first apparent point source P1, largely ignoring the second. Thus the listener perceives the sound as coming primarily from the direction of the left speaker or slightly inside the left speaker if the loudness of the second apparent point source P1 is significant compared to the first. This analysis describes the effect on any other sound sources recorded by the two microphones such that the difference in arrival times at the two microphones is greater than the maximum possible interaural time delay.
Referring to FIG. 6, for some reflected sounds the path lengths to the two microphones ML and MR will be such that the differences in arrival times of the reflected sound at the two microphones will be comparable to a possible value of interaural time delay. Thus, the reflected sound from point P2 to the left microphone ML along path d' would be approximately equal to the path length c' to the right microphone MR plus the interaural time delay .DELTA.t. Thus, assume that d' equals c'+.DELTA.t. When this occurs, the arrival of the reproduced sound from the two speakers at the corresponding ears at slightly different times will have the same effect as an interaural time delay giving the listener a definite impression of the direction and distance of the reflected sound. Referring to FIG. 7, as there illustrated each possible value of interaural time delay corresponds to an angle of incidence for the perceived sound within a 180.degree. arc. As the difference in arrival times at the microphones approaches the maximum possible value of the interaural delay, the apparent direction of the sound would swing rapidly to the right or left. In practice this is limited by the listening angle of the loudspeakers. When the time difference of the sounds arriving at the respective ears approaches the interaural delay corresponding to the listening angle of the speakers, the interaural crosstalk signal of the opposite speaker gradually takes precedence, effectively limiting the apparent sound sources to within the listening angle of the speaker.
It should be apparent at this point that all sound sources, ambient or otherwise, whose signals arrive at the respective microphones with a time difference greater than the interaural time delay corresponding to the listening angle of the reproducing speakers will appear to the listener as apparent sources behind and in the same general direction as one of the speakers as shown in FIG. 5. The delayed signal appearing in the other channel, being lower in loudness, will have only slight effect in drawing the apparent source inside the speakers. This has been confirmed by experiments which show that, in fact, the apparent sound source remains substantially within the listening angle defined by the speakers.
The existence of interaural crosstalk has long been known and discussed at some length in the literature. Additionally, there are several recent patents which have disclosed methods and techniques for enhancing the acoustic image of a stereophonic reproduction system through the manipulation of interaural crosstalk signals, without, however, making a complete analysis of the consequence of these manipulations.
One such prior art patent is U.S. Pat. No. 4,058,675 to Kobayashi et al. This patent discloses a means for cancelling interaural crosstalk by applying inverted and delayed versions of the left and right stereo signals respectively to a second pair of left and right speakers respectively positioned near the left and right main speakers so as to produce the correct geometry. It will be seen later that this method is effective only for certain special cases of the left and right input signals.
Carver discloses in U.S. Pat. No. 4,218,505 an electronic device for cancelling interaural crosstalk. This device inverts one stereo signal, splits it into several components, delays each component separately by a different amount and recombines these with a modified version of the other stereo signal. Performing this operation on both stereo signals, Carver claims to effect a cancellation of interaural crosstalk and to create a "dimensionalized effect."
U.S. Pat. No. 4,199,658 to Iwahara also discloses a technique for performing the interaural crosstalk cancellation for the special case of a binaural signal input. Iwahara uses a second pair of speakers to reproduce the cancellation signal, which is composed of a frequency and phase compensated version of the inverted main signal. This cancellation signal is fed to a speaker just outside the main speaker on the opposite side from which the cancellation signal was derived. The necessary delay is accomplished acoustically by the placement of the sub-speakers and detailed consideration is given to the phase and frequency compensation required to accomplish the cancellation. As previously mentioned, a binaural signal input is specified.
The methods or techniques disclosed in the prior art involve to a certain extent the cancellation of interaural crosstalk. It should be examined in detail what effect each of these would have on the listener's perception of the reproduced sound.
U.S. Pat. No. 4,058,675 to Kobayashi proposes a method for cancelling interaural crosstalk. This method will be discussed in reference to FIG. 8 labelled "Prior Art", and corresponding to FIG. 5 of U.S. Pat. No. 4,058,675.
It can be seen that there is a left speaker system consisting of a main speaker left, MSL, and a sub-speaker left, SSL. There is also a right speaker system consisting of a main speaker right, MSR and a sub-speaker right SSR. The left and right main speakers respectively receive the left and right stereo signals. The sub-speaker left is fed by the left stereo signal after passing through an attenuator, a delay, and a phaseshift. The attenuation, delay and phaseshift are selected such that the signal from the SSL will arrive at the left ear, El, simultaneously and out-of-phase with the signal from the right main speaker, MSR. If the left and right stereo signals are equal the signals from the SSL and MSR will effectively cancel at the left ear, El. Conversely the same will occur for the sub-speaker right, SSR, and the main speaker left, MSR, at the right ear, Er. Thus only when the left and right stereo signals are equal will the crosstalk paths be cancelled.
Assuming that a method or technique is successful in cancelling the interaural crosstalk, it should be examined what effect this would have on the listener's perception of the reproduced sound. Referring to FIG. 3, if the interaural crosstalk cancellation were successful, paths a and b to the opposite ears would be eliminated. This would help the localization of sources equidistant from the recording microphones (FIGS. 1 and 3). As the sources moved off center, however, the difference in arrival times at the two microphones increases corresponding to larger values of interaural time delay and hence greater angles of incidence as illustrated in FIG. 7. Since the crosstalk paths from the speakers have been cancelled out, the speakers give no directional information about themselves. The perceived direction of the apparent sound source will depend only on the difference in arrival times of the signal at the two recording microphones and to a much lesser degree the relative loudness. FIG. 9, for example, shows an off axis source whose signal arrives at the right microphone .DELTA.t later than at the left microphone. In this example .DELTA.t is equal to the maximum possible interaural time delay. When reproduced, with crosstalk cancelled, the right channel signal will arrive at the right ear .DELTA.t later than the left signal at the left ear. FIG. 10 shows the apparent source displaced far to the left of the listener, which it would appear to the listener in such a circumstance.
It should be clear that for microphones spaced far apart only a small displacement off the equidistant axis will be required to create an arrival time difference at the microphone equal to the maximum possible interaural time delay. This will result in a rather dramatic expansion of the center of the stereo stage. For sound sources further displaced and corresponding to time delays greater than the maximum possible interaural time delay, which will include most of the ambience information, the listener will have difficulty localizing the apparent source. In effect, the listener will perceive sounds as if he had ears placed at the recording microphone spacing and may perceive apparent sound sources within his own head when the microphone spacing is large. An accurate prediction of the effects of this situation is beyond the current state of the art of psychoacoustics and beyond the scope of this discussion. It is apparently because of this potential difficulty that the U.S. Pat. No. 4,199,658 to Iwahara specifies a binaural signal input. That is to say, that the recording has been made with a microphone spacing equal to the ear spacing. However, recordings made in this manner are extremely rare. U.S. Pat. No. 4,218,505 to Carver, however, describes the effect that might result if crosstalk cancellation was successfully applied to the reproduction of commonly available recordings:
"The overall effect of this is a rather startling creation of the impression that the sound is `totally dimensionalized`, in that the hearer somehow appears to be `within the sound` or in some manner surrounded by the various sources of the sound." (U.S. Pat. No. 4,218,585, column 9, lines 35-39).
Although this effect that Carver describes may be an interesting aural effect, it is not believed to give a realistic impression of the original performance, particularly in the reproduction of ambient information which constitutes the majority of far-off axis signals.
In addition the methods referenced above fail to adequately consider the consequences of large scale cancellation of acoustic energy at low frequencies. Cancellation of acoustic energy occurs whenever the acoustic signals from two or more sources interfere destructively. This interference creates a complicated pattern of nodes and antinodes spaced corresponding to the wavelength. When the spacing between nodes is small, less than one foot, the interference is normally not noticeable when listening to music. When the spacing is several feet or more the interference can be noticeable to a listener as a change in frequency balance of the sound as the listener moves from an area of constructive interference (antinode), to an area of destructive interference, (node). A pair of speakers operating with the same signal, in phase would produce constructive interference, (antinode) at the normal listening positions equidistant from the two speakers. If the phase of one speaker is reversed the antinode at the listening position would become a node (cancellation). The extent of the node would be comparable to the wavelengths involved. It is well known that low frequency sounds are mostly perceived through the conversion of acoustical energy to mechanical or vibrational energy which is felt rather than heard by the listener. Thus a listener positioned at such a node would perceive a considerable reduction of lower frequencies. At the lowest audio frequencies where wavelengths are comparable or larger than room dimensions the extent and magnitude of the reduction would be greatest.
The apparatus and technique disclosed in U.S. Pat. No. 4,199,658 to Iwahara, for example, would suffer from this problem. Although the apparatus would create the desired sound pressure at each of the two ear locations, the presence of the inverted versions of both the left and right signals would cause a substantial cancellation of low frequency energy throughout the listening area. The effect could be compared to that of listening to headphones where although the listener "hears" low frequency sounds there is very little low frequency energy to `feel`. As a result the sound has no physical impact and lacks realism.