This application claims the priority of provisional application 60/455,497 filed 18 Mar. 2003 and is hereby incorporated herein by reference. The inventor's paper entitled “Scalable Tri-play Recording for Stereo, ITU 5.1/6.1 2D, and Periphonic 3D (with Height) Compatible Surround Sound Reproduction” presented at the 115th convention of the Audio Engineering Society in October of 2003 is hereby incorporated herein by reference in its entirety.
Lifelike reproduction of sound has long been a subject of scientific exploration and experimentation. While we may not have completed this exploration, we now know enough to record and reproduce a very good approximation of the lifelike sounds of, for example, musical performance in an acoustic space, and other applications. We do know that it is essential to preserve true three-dimensionality of the arrivals at the ear of both direct and reflected sounds, or close approximations of their directions of arrival. We say “true three-dimensionality” (“3D”) because the term is much misused. For example, methods are often termed 3D where reproducers (e.g., loudspeakers) are arranged only in the horizontal plane. These methods can only reliably preserve horizontal angles of sound arrivals where the listener is at the center of a horizontal circle. However, in live listening in an acoustic space, reflections also arrive from above and below, at vertical angles of elevation, referred to as “height”, and resulting in truly natural “periphonic” hearing.
For lifelike reproduction, there are both (a) important reasons why the most reliable way to reproduce height is by locating loudspeakers above and below the listener, who is now at the center of a sphere, not just a circle, and (b) important reasons why height must also be preserved in the first place.
Regarding point (a) above, in the past, less reliable methods have attempted to generalize an important aspect of human Head-Related Transfer Functions (“HRTF”) using generalized filters or so-called “dummy-head” microphones, intended to deliver to inside the two ear canals of the listener what was recorded at the two ear canals of the dummy head. The problem is that the human mechanism for determining sound arrivals from above or below is the pinna, or outer ear. Folds of the pinna cause reflections of higher frequency sounds either partially to reinforce or partially to cancel, or attenuate, depending on both the frequency and the direction of the sound, both horizontal and vertical. But each human individual's pinna are as unique as a fingerprint, so generalized filters or generalized “dummy pinna” work more or less poorly for each listener. Miniature microphones placed within the ear canals of the recordist/listener result in more lifelike reproduction, but only with that one person doing the recording and/or listening.
For lifelike reproduction by a group of listeners—such as in listening to recorded music in a home theater, training in a simulator, or virtual reality for computer multi-media, or riding an amusement ride—loudspeakers must be located above and below as well as around the listeners. Each listener's pinna, in “agreement” with other aspects of their individual HRTF, will determine for them both the azimuth and elevation of each sound, just as they have learned these complex relationships for themselves since childhood.
Regarding point (b) above, why must true 3D (i.e., with height) be preserved in the first place? The reason is that humans learn sound directionality by relating seeing sources of sound with the hearing mechanisms described above. Through a complex ear-brain response the listener knows the direction of a sound—above or below as well as horizontally—even when facing another way or with eyes closed. In acoustic spaces, unseen reflections arrive at different times, building up to steady state, then collapse in the same order when the source of the sound stops. Each arrival and “departure” from each direction is tonally “colored” by the pinna. Musicians hear this same complex interplay and form each note, phrase, even pause, to be “musically correct”, playing the acoustic as an extension of their instrument. The “tonality” or timbre of their guitar, piano, or violin would sound very different in a different space. They will play differently in a different hall to be musically correct in that hall, such as playing faster or more legato in a small space and slower and more pizzicato in a large one. Listeners in the same space learn this “musical language” and appreciate the music more when they agree it is correct. But take away height reflections from the ceiling or acoustic clouds above the stage and the timbre changes dramatically.
So for lifelike reproduction of natural sounds such as music, spherically positioned reproducers of sound are a requirement.
Numerous approaches termed “three-dimensional” are in fact only two-dimensional since they use speakers only in the horizontal plane. If the listener perceives any height sounds, they can only be due to the acoustics of the listening environment, which are invalid in reproducing the space where the music was recorded. Other approaches attempt to simulate height auditory “cues”, or signals, to the ear-brain system, however these cannot be generalized reliably to life-like degree for all listeners because their pinna are as individual as their fingerprints, as described above. If the goal is to believably reproduce the recorded space, then the listener will believe he has been “transported” to that space and is no longer in the listening space. If the recorded space is an acoustic one with reflective ceiling and floor elements, lifelike believability requires vertically-arriving sounds to be preserved. Since we cannot successfully generalize pinna colorations (e.g., by using filters and/or dummy heads) that connote height, we can best reproduce height cues by using loudspeakers above and below the listeners. But an infinite number of loudspeakers and channels as in real life would be infinitely impractical.
Prior art systems, such as 1st Order Ambisonics, creates a reasonable approximation of three-dimensionality using four channels and a minimum of eight loudspeakers. Ambisonics has not succeeded in the marketplace for a variety of reasons, not the least of which is the fact that Ambisonics does not produce a lifelike reproduction of sound in front of the listener, where the ear-brain “perceptualization” is most acute.
Another prior art system, called Ambiophonics, uses a two-channel binaural-based approach that precisely positions sounds across a 120 degree arc in front of a listener where such localization is most important for lifelike hearing. In order to localize frontal sounds widely yet accurately, Ambiophonics uses two closely-spaced speakers, called a “stereo dipole” or “Ambiopole”, and transaural crosstalk cancellation. However, Ambiosonics is inherently two-dimensional and incapable of producing three-dimensional sound with height.
Prior art monaural systems sounded correct tonally but had a “stage door” affect: it was localized at a point in 2D for coming through a narrow opening, say, in an orchestra shell wall. Prior art stereo systems, while providing spaciousness in sound in two dimensions, suffer from lack of localization as the speakers are typically placed as the front left and front right positions, thereby leaving a large gap between the speakers. Other prior art systems, such as ITU 5.1/6.1 and stereo, favor spaciousness and simulating tonality at the price of accurate localization—as though mutually exclusive. ITU 5.1/6.1 systems extend the stereo concept to envelop listeners but only in two dimensions. A height component is lacking.
Another prior art system is WaveField Synthesis (“WFS”). The WFS system is limited to two dimensions and therefore lacks the directionality of height and the natural timbral quality achievable by systems and methods exercising the present invention. Furthermore, WFS requires upwards of 36 speakers and is impractical at present in needing as many channels for distribution and digital signal processing as for reproduction.
Yet other prior art systems, known collectively as Higher Order Ambiosonics (“HOA”) likewise have deficiencies. Along with the deficiencies previously noted for Ambiosonic systems, HOA systems require nine or more channels for Ambisonic components for a total of 11 or more distribution channels. Currently, six full-range channels is the current limitation of distribution media such as DVD-A, SACD, and DTS-CD.
No prior art systems have yet been able to reproduce accurate 3D sound—with height and accurate spaciousness, tonality, and localization. The present invention produces life-like 3D sound with correct spatial impression, timbre (tonality), and localization. Furthermore, embodiments of the present invention plays compatibly in stereo, ITU 5.1/6.1, full 3D using available 6-channel media, and full 3D using 10 or more speakers in a home theater or height-modified cinema.
It is an object of the present disclosure to provide a novel system and method for accurately reproducing a 3D sound field.
It is another object of the present disclosure to provide a novel system and method for combining accurate reproduction of “front stage sound” with accurate three-dimensional localization of sound to produce a sound field with height and accurate spaciousness, tonality, and localization.
It is yet another object of the present disclosure to provide a novel system and method for producing a signal which accurately reproduces a 3D sound field that is also capable of play back on current surround 2D sound systems without the use of a decoder or the need to add additional speakers.
It is still another object of the present disclosure to provide a novel system and method for providing a transformation matrix for mapping a 3D sound field into a signal for providing a 2D sound field without the need for a decoder.
It is still yet another object of the present disclosure to provide a novel system and method for providing a reconstitution matrix for accurately reproducing a 3D sound field.
It is a further object of the present disclosure to provide a novel system and method for a microphone array capable of capturing a sound field in three dimensions.