Although humans have only two ears, we hear sound as a three dimensional entity, relying upon a number of localization cues, such as head related transfer functions (HRTFs) and head motion. Full fidelity sound reproduction therefore requires the retention and reproduction of the full 3D soundfield, or at least the perceptual cues thereof. Unfortunately, sound recording technology is not oriented toward capture of the 3D soundfield, nor toward capture of a 2D plane of sound, nor even toward capture of a 1D line of sound. Current sound recording technology is oriented strictly toward capture, preservation, and presentation of zero dimensional, discrete channels of audio.
Most of the effort on improving fidelity since Edison's original invention of sound recording has focused on ameliorating the imperfections of his original analog modulated-groove cylinder/disc media. These imperfections included limited, uneven frequency response, noise, distortion, wow, flutter, speed accuracy, wear, dirt, and copying generation loss. Although there were any number of piecemeal attempts at isolated improvements, including electronic amplification, tape recording, noise reduction, and record players that cost more than some cars, the traditional problems of individual channel quality were arguably not finally resolved until the singular development of digital recording in general, and specifically the introduction of the audio Compact Disc. Since then, aside from some effort at further extending the quality of digital recording to 24 bits/96 kHz sampling, the primary efforts in audio reproduction research have been focused on reducing the amount of data needed to maintain individual channel quality, mostly using perceptual coders, and on increasing the spatial fidelity. The latter problem is the subject of this document.
Efforts on improving spatial fidelity have proceeded along two fronts: trying to convey the perceptual cues of a full sound field, and trying to convey an approximation to the actual original sound field. Examples of systems employing the former approach include binaural recording and two-speaker-based virtual surround systems. Such systems exhibit a number of unfortunate imperfections, especially in reliably localizing sounds in some directions, and in requiring the use of headphones or a fixed single listener position.
For presentation of spatial sound to multiple listeners, whether in a living room or a commercial venue like a movie theatre, the only viable alternative has been to try to approximate the actual original sound field. Given the discrete channel nature of sound recording, it is not surprising that most efforts to date have involved what might be termed conservative increases in the number of presentation channels. Representative systems include the panned-mono three-speaker film soundtracks of the early 50's, conventional stereo sound, quadraphonic systems of the 60's, five channel discrete magnetic soundtracks on 70 mm films, Dolby surround using a matrix in the 70's, AC-3 5.1 channel sound of the 90's, and recently, Surround-EX 6.1 channel sound. “Dolby”, “Pro Logic” and “Surround EX” are trademarks of Dolby Laboratories Licensing Corporation. To one degree or another, these systems provide enhanced spatial reproduction compared to monophonic presentation. However, mixing a larger number of channels incurs larger time and cost penalties on content producers, and the resulting perception is typically one of a few scattered, discrete channels, rather than a continuum soundfield. Aspects of Dolby Pro Logic decoding are described in U.S. Pat. No. 4,799,260, which patent is incorporated by reference herein in its entirety. Details of AC-3 are set forth in “Digital Audio Compression Standard (AC-3),” Advanced Television Systems Committee (ATSC), Document A/52, Dec. 20, 1995 (available on the World Wide Web of the Internet at www.atsc.org/Standards/A52/a—52.doc). See also the Errata Sheet of Jul. 22, 1999 (available on the World Wide Web of the Internet at www.dolby.com/tech/ATSC_err.pdf.
Once the sound field is characterized, it is possible in principle for a decoder to derive the optimal signal feed for any output loudspeaker. The channels supplied to such a decoder will be referred to herein variously as “cardinal,” “transmitted,” and “input” channels, and any output channel with a location that does not correspond to the position of one of the input channels will be referred to as an “intermediate” channel. An output channel may also have a location coincident with the position of an input channel.