A. Goal of High-Fidelity Reproduction
A goal for high-fidelity reproduction of recorded or transmitted sounds is the presentation at another time or location a faithful representation of an "original" sound field. A sound field is defined as a collection of sound pressures which are a function of time and space. Thus, high-fidelity reproduction attempts to recreate the acoustic pressures which existed in the original sound field in a region about a listener.
Ideally, differences between the original sound field and the reproduced sound field are inaudible, or if not inaudible at least relatively unnoticeable to most listeners. Two general measures of fidelity are "sound quality" and "sound field localization."
Sound quality includes characteristics of reproduction such as frequency range (bandwidth), accuracy of relative amplitude levels throughout the frequency range (timbre), range of sound amplitude level (dynamic range), accuracy of harmonic amplitude and phase (distortion level), and amplitude level and frequency of spurious sounds and artifacts not present in the original sound (noise). Although most aspects of sound quality are susceptible to measurement by instruments, in practical systems characteristics of the human hearing system (psychoacoustic effects) render inaudible or relatively unnoticeable certain measurable deviations from the "original" sounds.
Sound field localization is one measure of spatial fidelity. The preservation of the apparent direction, both azimuth and elevation, and distance of a sound source is sometimes known as angular and depth localization, respectively. In the case of certain orchestral and other recordings, such localization is intended to convey to the listener the actual physical placement of the musicians and their instruments. With respect to other recordings, particularly multitrack recordings produced in a studio, the angular directionality and depth may bear no relationship to any "real-life" arrangement of musicians and their instruments and the localization is merely a part of the overall artistic impression intended to be conveyed to the listener. In any case, one purpose of high-fidelity multi-channel reproduction systems is to reproduce spatial aspects of an on-going sound field, whether real or synthesized. As with respect to sound quality, in practical systems measurable changes in localization are, under certain conditions, inaudible or relatively unnoticeable because of characteristics of human hearing.
Even with respect to those recordings in which the localization is intended to convey the impression of being present at the original recording, the producer must choose among various philosophies of microphone placement and sound mixing and recording, each of which results in the capturing of sound fields that differ from one another. Apart from variations introduced by artistic and technical judgments and preferences, the capture of a sound field is at best an approximation of the original sound field because of the inherent technical and practical limitations in recording, transmission and reproducing equipment and techniques.
Numerous decisions, adjustments, and combinations available to a sound field producer will be obvious to one skilled in the art. It is sufficient to recognize that a producer may develop recorded or transmitted signals which, in conjunction with a reproduction system, will present to a human listener a sound field possessing specific characteristics in sound quality and sound field localization. The sound field presented to the listener may closely approximate the ideal sound field intended by the producer or it may deviate from it depending on many factors including the reproduction equipment and acoustic reproduction environment.
In most, if not all cases, the sound field producer works in a relatively well defined system in which there are known playback or presentation configurations and environments. For example, a two-channel stereophonic recording is expected to be played back or presented by either a stereophonic or a monophonic playback or presentation system. The recording is usually optimized to sound good to most listeners having a wide variety of stereophonic and monophonic equipment ranging from the very simple to the very sophisticated. As another example, a recording in stereo with surround sound for motion pictures is made with the expectation that motion picture theaters will have either a known, generally standardized arrangement for reproducing the left, center, fight, bass and surround channels or, alternatively, a classic "Academy" monophonic playback. Such recordings are also made with the expectation that they will be presented in home listening environments with equipment ranging from a television with one small loudspeaker to relatively sophisticated home surround sound systems which closely replicate a theater surround sound experience.
A sound field captured for transmission or reproduction is at some point represented by one or more electrical signals. Such signals usually constitute one or more channels at the point of sound field capture ("capture channels"), at the point of sound field transmission or recording ("transmission channels"), and at the point of sound field presentation ("presentation channels"). Although within some limits as the number of these channels increases, the ability to reproduce complex sound fields increases, practical considerations impose limits on the number of such channels.
Early sound recording and reproducing systems relied on single transmission and presentation channels. Later, multichannel systems came into use, the most popular of which for music continues to be the stereophonic system, comprising two transmission and presentation channels. Motion picture and home video sound systems commonly employ four or more presentation channels. Techniques such as audio matrixing have been used to reduce the number of transmission channels, particularly for carrying audio information for four presentation channels in the two track media of motion picture optical soundtracks and home video. Such matrixing techniques permit an approximate reproduction of the sound field that would be produced from four presentation channels carded by four transmission channels. Existing matrix techniques, however, result in a degradation of the reproduced sound field, particularly with respect to the separation between presentation channels, even when matrix enhancement circuits are employed in the recovery of matrixed sound signals.
Accordingly, despite the inefficiency of doing so, it is sometimes necessary to maintain many transmission channels throughout the recording and transmitting process in order to achieve desirable levels of spatial fidelity.
Aside from the choices mentioned above, the representation of a sound field by one or more channels also involves additional artistic and technical choices. A sound field producer may choose how many capture channels to employ and how the sound field is to be "mapped" onto the capture channels. The sound field transmitter may choose the number of transmission channels, and how the audio information is coded for recording or transmission. The listener may choose the number of presentation channels, or the choice may be dictated by the listener's reproducing equipment, requiring, for example that a sound field recorded in a two channel stereophonic format be played back or presented through a single monophonic channel system. The listener may also choose where transducers or loudspeakers reproducing the channels are placed in a listening environment and whether to "enhance" or modify the sound by boosting or cutting portions of the sound spectrum or by adding reverberation or ambience. In some cases, the listener has little control such as in motion picture theaters.
The number of channels employed by the system, however, should not be a source of concern to the listener once the system is set up and operating. The listener's attention should not be audibly attracted by such technical details of the sound system any more than a viewer should be visibly aware that color television uses only three colors rather than the entire visible spectrum.
Deviations between the desired sound field and the actually reproduced sound field often arise because of a desire to minimize the mount of information required to achieve high-fidelity reproduction. One example, mentioned above, is the use of a matrix to convey four channels of sound information on two track media. There is a desire among workers in the audio art, however, to more exactly preserve the original sound field while at the same time even further reducing the amount of information required to represent the sound field during the transmission and recording process. By reducing the amount of required information, signals may be conveyed by transmission channels with reduced information capacity, such as lower bandwidth or noisier transmission paths, or lower storage capacity recording media. Ideally, such an arrangement with reduced information requirements should allow the reproduction of a sound field audibly indistinguishable or nearly indistinguishable from the originally intended sound field.