In recent years, many digital television broadcast receivers and DVD players for playing back 5.1ch audio content items have been developed and prepared for the market. Here, “5.1ch” is a channel setting for arranging front left and right channels, a front center channel, and left and right surround channels. Some of recent Blu-ray (registered trademark) players have a 7.1ch configuration in which left and right back surround channels are added.
On the other hand, with further increases in the sizes of image screens and in the definitions of images, virtual surround of audio objects has been vigorously studied. For example, virtual surround in the case where 22.2ch speakers are arranged has been studied. FIG. 14 illustrates a speaker arrangement in the case of 22.2ch audio playback that has been currently researched and developed by Japan Broadcasting Corporation (Nippon Hoso Kyokai, NHK). The speaker arrangement is a three-dimensional configuration in which speakers are arranged also on a floor (the lowermost plane) and on a ceiling (the uppermost plane) in FIG. 14, unlike a conventional speaker arrangement in which speakers are arranged only on a two-dimensional plane (the middle plane) in FIG. 14.
In addition, effort for differentiating movie theaters using three-dimensional acoustic effects have been vigorously made (Non-patent Literature 2). In this case, speakers are arranged also on a ceiling in a three-dimensional (3D) configuration. Here, content items are coded as audio objects. An audio object is an audio signal with playback position information indicating, in a three-dimensional space, the position at which a sound image is to be localized. For example, an audio object is a coded signal of a pair of (i) playback position information indicating the position at which a sound source (sound image) is localized in the form of coordinates (x, y, z) along three axes and (ii) an audio signal of the sound source.
For example, when creating an audio object of any of a bullet, an airplane, and a note of a flying bird, etc., the position indicated by playback position information is caused to transit with time from one minute to the next. In this case, the playback position information may be vector information indicating a transition direction. In the case of an explosion sound etc. generated at a certain position, playback position information is naturally constant.
In this way, playback of audio signals with playback position information has been researched and developed on the premise that speakers are arranged three-dimensionally. However, it is impossible to arrange speakers three-dimensionally in many cases for actual home use or personal use.
As a technique for enabling audio playback with higher-possible realistic sensations under an environment where speakers cannot be arranged freely, a method using a head related transfer function (HRTF), wavefront synthesis, and beam forming, etc. have been researched and developed.
The HRTF is a transfer function for simulating propagation property of a sound around the head of a listener. A perception of a sound arrival direction is said to be affected by the HRTF. As illustrated in FIG. 15, the perception is mainly affected by a binaural sound pressure difference and a time difference of sound waves reaching both ears. Conversely, it is possible to control a sound arrival direction by artificially controlling these differences by signal processing. Details for this are described in Non-patent Literature 3. Clues related to localization in the back and forth and perpendicular directions are said to be included in HRTF amplification spectra. Details for this are described in Non-patent Literature 1.
The basic operation principle of the wavefront synthesis is as illustrated in (a) of FIG. 16. Since sound waves are concentrically diffused about a sound source (expect for the case where a speaker is arranged at the position of the sound source), it is impossible to generate natural sound waves in space. However, by arranging a plurality of speakers in a column (to form a speaker array) and appropriately controlling the sound pressures and phases, it is possible to generate, in a space, a part of concentric waveforms of sound waves that are virtually diffused from the sound source. Details for this are described in Non-patent Literature 4.
The basic operation principle of the beam forming is as illustrated in (b) of FIG. 16. Similar to the case of the wavefront synthesis, the beam forming uses a speaker array, and by appropriately controlling sound pressures and phases, it is possible to make the sound pressure level at a certain position higher than those in the surrounding area. By doing so, it is possible to reproduce a state where the sound source is virtually present at the position. Details for this are described in Non-patent Literature 5.