This invention is directed to an automatic stereophonic image enhancement system and apparatus where the electronic signal which corresponds to the audio signal is electronically treated by amplitude and phase control to produce enhanced perception of the stereophonically reproduced music.
Sound is vibration in an elastic medium, and acoustic energy is the additional energy in the medium produced by the sound. Sound in the medium is propagated by compression and rarefaction of the energy in the medium. The medium oscillates, but the sound travels. A single cycle is a complete single excursion of the medium, and the frequency is the number of cycles per unit time. Wavelength is the distance between wave peaks, and the amplitude of motion (related to energy) is the oscillatory displacement. In fluids, the unobstructed wavefront spherically expands.
Hearing is the principal response of the subject to sound. The ear, its mechanism and nerves receive and transmit the hearing impulse to the brain which receives it, compares it to memory, analyzes it, and translates the impulse into a concept which evokes a mental response. The final step in the process of hearing takes place in the brain; the ear is only a receiver. Thus, sound is objective and hearing is subjective. Since the system and apparatus of this invention is for the automatic stereophonic image enhancement for human listening, the listening process is in perceptions. Described in this specification are those of human subjects. Because a subject has two ears, laterally spaced from each other, the sound at each eardrum is nearly always different. Some of the differences are due to the level, amplitude or energy, while others are due to timing or phase differences. Each ear sends a different signal to the brain, and the brain analyzes and compares both of the signals and extracts information from them, including information in determining the apparent position and size of the source, and acoustic space surrounding the listener.
The first sound heard from a source is the direct sound which comes by line-of-sight from the source. The direct sound arrives unchanged and uncluttered and lasts only as long as the source emits it. The direct sound is received at the ear with a frequency response (tonal quality) more true to the sound produced by the source because it is subject only to losses in the fluid medium (air). The important transient characteristics such as timbre, especially in the high registers, are conveyed by direct sound.
The interaural differences at each eardrum are found in time, amplitude and spectral differences. The physical spacing of the ears causes one ear to hear after the other, except for sound originating from a source on the median plane. The time delayed difference is a function of the direction from which the sound arrives, and the delay is up to about 0.8 millisecond. The 0.8 millisceond time delay is about equal to the period of 1 cycle at 1,100 hertz. Above this frequency, the acoustic wavelength of arriving sounds becomes smaller than the ear-to-ear spacing, and the interaural time difference decreases in significance so that it is useful only below about 1,400 hertz to locate the direction of sound. The difference in amplitude between the sound arriving at the two ears results principally from the diffracting and scattering effect of the head and external ear. These effects are greater above 400 hertz and become the source of information the brain interpretes to determine the direction of the source for higher frequencies. Other clues to elevation and direction of the sound derive from our practice of turning our head during the sound direction evaluation process. This changes the relative amplitude and time difference to provide further data for mental processing to evaluate direction. Both processes are frequency dependent, but it has been shown that the time difference is more useful with transient portions of sound while both are used for evaluation of the source direction of continuous signals.
In human hearing, memory plays an important role in the evaluation of sound. The brain compares the interaural temporal difference, interaural amplitude difference, interaural spectral difference, as well as the precedence effect, and temporal fusion, to be described later, with memories of the same factors. The brain is constantly comparing present perceptions with stored impressions so that those signals which are currently being received are compared with memory to provide a conception of the surrounding activity. When we hear sound, the combination of the sound as perceived and the memory together produce a mental image of a conceptual geometrical framework around us associated with the sources of sound to become thus a conceptual image space. In the conceptual image space, what is real and what seems to be real are the same. The present system and apparatus is directed toward generating a conceptual image space which seems to be real but, from an objective evaluation, is an illusion.
In a system where there are two, spaced loudspeaker sound sources in front of the observer, with the observer centered between them, the production of substantially the same sound from each speaker, in phase and of the same amplitude, will present to the observer a virtual sound image midway between the two speakers. Since the sound source is in phase, this virtual sound image will be called a "homophasic image." By changing the relative amplitude, the homophasic image can be moved to any point between the two speakers. In professional processing of sound signals, this moving action is called "panning" and is controlled by a pan pot.
An equally convincing virtual sound image can be heard if the polarity is reversed on one of the signals sent to one of the same two loudspeakers. This results in a 180 degree phase shift for the sound from that speaker reaching the ears. For simplification, the direct 0 degree phase shift from the (for example) left speaker first reaches the left ear and later reaches the right ear, and the 180 degree retarded phase-shifted signal from the right speaker first reaches the right ear and later the left ear, providing information to the ear-brain mechanism which manifests a virtual sound image to the rear of the center point of the listener's head. This virtual image is the "antiphasic" image. Since it is a virtual image created by mental processes, the position is different for different listeners. Most listeners hear the antiphasic image as external and to the rear of the skull. The antiphasic image does not manifest itself as a point source, but is diffused and forms rear of the listener's conceptual image space. By changing the phase relationship and/or amplitude of the left and right signals, virtual images can be generated along an arc or semicircle from the back of the observer's head toward the left or right speakers.
Another factor which influences the perception of sound is the "precedence effect" wherein the first sound to be heard takes command of the ear-brain mechanism, and sound arriving up to 50 milliseconds later seems to arrive as part of and from the same direction as the original sound. As outlined above, by delaying the signal sent to one speaker, as compared to the other, the apparent direction of the source can be changed. As part of the precedence effect, the apparent source direction is operative through signal delay for up to 30 milliseconds. The effect is dependent upon the transient characteristics of the signal.
An intrinsic part of the precedence effect, yet an identifiably separate phenomenon, is known as "temporal fusion" which fuses together the direct and delayed sounds. The ear-brain mechanism blends together two or more very similar sounds arriving at nearly the same time. After the first sound is heard, the brain suppresses similar sounds arriving within about the next 30 milliseconds. It is this phenomenon which keeps the direct sound and room reverberation all together as one pleasing and natural perception of live listening. Since the directional hearing mechanism works on the direct sound, the source of that sound can be localized even though it is closely followed by multiple waves coming from different directions.
The walls of the room are reflection surfaces from which the direct sound reflects to form complex reflections. The first reflection to reach the listener is known as a first order reflection; the second, as second order, etc. An acoustic image is formed which can be considered as coming from a virtual source situated on the continuation of a line linking the listener with the point of reflection. This is true of all reflection orders. If we generate signals which produce virtual images, boundaries are perceived by the listener. This is a phenomenon of conditioned memory. The position of the boundary image can be expanded by amplitude and phase changes within the signal generating the virtual images. The apparent boundary images broaden the perceived space.
Audio information affecting the capability of the ear-brain mechanism to judge location, size, range, reverberation, spatial identity, and ambiance can be extracted from the difference between the left and right source. Modification of this information through frequency shaping and linear delay is necessary to produce the perception of phantom image boundaries when this information is mixed back with the original stereo signal at the antiphasic image position.
The common practice of the recording industry, for producing a stereo signal, is to use two or more microphones near the sound source. These microphones, no matter how many are used, are always electrically polarized in-phase. When the program source is produced under these conditions (which are industry standard), the apparatus herein generates a "synthetic" conditioning signal for establishment of a third point with its own time domaine. This derivation is called synthetic because there is a separation, alteration and regrouping to form the new whole.
To further help establish a point with a separate time domaine, a third microphone may be used to define the location of the third point in relation to the stereo pair. Contrary to the normal procedure of adding the output of a third microphone to the left and right side of the stereo microphone pair, the third microphone is added to the left stereo pair and subtracted from the right stereo pair. This arrangement provides a 2-channel stereo signal which is composed of a left signal, a right signal, and a recoverable signal which had its source at a related but separate position in the space being recorded. This is called organic derivation and compares to the synthetic situation where the ratios are proportional to the left minus the right (from which it was derived) but is based on its own time reference, which is related to the spacing between the three microphones. The timing between the organic conditioning signal is contingent upon the position of the original sound source with respect to the three microphones. The information derived more closely approximates the natural model than that of the synthetically derived conditioning signal.
Control over either the organic or synthetic situations, the processing thereof, and the generation of a conditioning signal therefrom will produce an expanded listening experience.
All sources of sound recorded with two or more microphones in normal or organic situations contain the original directional cues. When acted upon by the apparatus of this invention, a portion of the original directional cues are isolated, modified, reconstituted and added, in the form of a conditioning signal, to the original forming a new whole. The new whole is in part original and in part synthetic. The control of the original-to-synthetic ratio is under the direction of the operator via two operating modes: 1-Space) In which the ratio is constant. Synthetic is directly proportional to the original and, therefore, enhancement depends upon the amount of original information present in the stereo program material. 2-AutoSpace) In which the ratio is electrically varied. Synthetic is inversely proportional to the original and, therefore, the enhancement is held at a constant average regardless of program material.
When a stereo recording is reproduced monophonically, it is said to be compatible if the overall musical balance does not change. The dimensionality of the stereo recording will disappear when reproduced monophonically but the inner instrumental balance should remain stable with L+R (left plus right sources) combining.
The compatibility problem arises because monophonic or L+R does not contain the total information present in the left and right sources. When combined as such, it contains only the information of similarity in vectorial proportion. The differential information is lost. Unfortunately, it is possible for the differential signal to contain as much identity about the musical content of a source as does the summation signal.
Since differential information will be lost in left plus right combining, directional elements should comprise most of the differential signal. Directional information will be of little use in monophonic reproduction and its loss will be of no consequence with respect to musical balance. Therefore, additional dimensional or spatial producing elements must be introduced in such a way that their removal in L+R combining will not destroy the musical balance established in the original stereophonic production.
Insertion of the conditioning signal at the antiphasic image position produces enhancement to and generation of increased spatial density in the stereo mode but is completely lost in the mono mode where the directional information will be unused. Information which can be lost in the mono mode without upsetting the inner instrument musical balance includes clues relating to size, location, range, and ambience but not original source information.
To accomplish this, directional information is obtained exclusively from the very source which is lost in the monophonic mode, namely, left signal minus right signal.
Whether in the synthetic or organic model derivation of a conditioning signal, subtracting the left signal from the right signal and reinserting it at the antiphasic position will not challenge mono/stereo compatibility, providing that the level of conditioning signal does not cause the total RMS difference energy to exceed the total RMS summation energy at the output.