The prior art discloses various methods for recording and reproducing a three dimensional auditory scene for individual listeners. All of these methods use one or more microphones to record the sound.
Some of the prior methods for recording and reproducing a three dimensional auditory scene for individual listeners use a custom arrangement of microphones that depends on the acoustic environment and the particular auditory scene to be recorded. Some of these methods involve setting up “room” or “ambience” microphones away from the direct sound source and playing the sound recorded from these microphones to the listening audience using “surround loudspeakers” placed to the side or back of the listening audience.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners use a specific arrangement of microphones. Some of these methods involve using a M/S or Mid-Side/Mono-Stereo microphone arrangement in which a forward-facing microphone (the Mid/Mono signal) and a laterally-oriented bi-directional or figure-eight microphone (the Stereo signal) are used to record the sound. Other of these methods use two first-order cardiod microphones with approximately 17 cm between the two microphones and crossed-over at an angle of approximately 110° in the shape of the letter ‘X’ and is often referred to as the ORTF recording technique. Yet another of these methods uses two bi-directional microphones located at the same point and angled at 90° to each other and is often referred to as the Blumlein technique. Another of these methods uses two first order cardiod microphones located at the same point and angled at 90° to each other and is often referred to as the XY recording technique.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners use four separate microphone elements arranged in a tetrahedron inside a single capsule. Three of the four elements are arranged as M/S pairs and are often referred to microphones for recording the X,Y,Z Cartesian directions. The fourth microphone element is an omni-directional microphone often referred to as the W channel. The four microphones are usually positioned at the same location and this microphone arrangement is often referred to as a SoundField microphone or a B-format microphone. The sound recorded from the four microphones is often played over loudspeakers or headphones using a mixing matrix to mix together the sound recorded from the four microphone elements and such a playback system is often referred to as an Ambisonic surround sound system.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners use two microphones usually embedded on opposite ends of a sphere and often flush-mounted with the surface of the sphere and is often referred to as a sphere microphone.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use two microphones usually embedded on opposite ends of a sphere and often flush-mounted with the surface of the sphere and two bi-directional microphones usually facing forward that are added to the side of the microphones mounted on the sphere. The sound recorded from the flush-mounted microphone on the sphere and the bi-directional microphone positioned next to it are often added and subtracted to produce sound signals for playback Such a system of microphones is often referred to as a KFM 360 or Bruck system.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use a five-channel microphone array and a binaural dummy head. Three of the microphones are often mounted on a single support bar with a distance of 17.5 cm between each microphone. These microphones are often positioned 124 cm in front of the binaural dummy head. The two outside microphones often have a super-cardiod polar characteristic and are often angled 30° off centre. The centre microphone often has a cardiod polar characteristic and faces directly front. The other two microphones, often referred to as the surround microphones, are often omni-directional microphones placed in the ears of a dummy head that is often attached to a torso.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use five matched dual-diaphragm microphone capsules mounted on a star-shaped bracket assembly. The arrangements of the microphones on the bracket often match the conventional five loudspeaker set-up, with three microphones at the front closely spaced for the left, centre, and right channels and two microphones at the back for the rear left and rear right channels. The five microphone capsules can often have their polar directivity pattern adjusted independently so that they can have a polar pattern varying from omni-directional to cardiod to figure-of-eight. Some of these methods are referred to as the ICA 5 or the Atmos 5.1 system.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use eight hypercardiod microphones arranged equispaced around the circumference of an ellipsoidal or egg-shaped surface in a horizontal plane. Some of these methods use additional microphones with a hemispherical pick-up pattern mounted on the top of the ellipsoid facing upwards and on the bottom facing downward. Some of these methods playback the recorded sounds using loudspeakers position in the direction in which the microphones pointed. Some of these methods are referred to as a Holophone system.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use seven microphones mounted on a sphere. Some of these methods often use 5 equal-angle spaced hypercardiod microphones in the horizontal plane plus two highly directional microphones aimed vertically up and down. Some of these methods play the recorded sound to the listening audience using a 7-to-5 mixdown with 5 loudspeakers positioned in the direction in which the 5 equal-angle spaced microphones pointed. Some of these methods are referred to as the ATT apparatus for perceptual sound field reconstruction.
Some of the prior art methods for recording and reproducing a three dimensional auditory scene for individual listeners often use two pairs of microphones mounted on opposite sides of a sphere in the horizontal plane. Some of these methods use microphone positioned at ±80° and ±110° on the sphere. Some of these methods play the recorded sound to the listening audience using loudspeakers positioned at ±30° and ±110° in the horizontal plane. Some of these methods employ methods of inverse filtering in order to best approximate the sound recorded at the microphones using the loudspeakers.
All of these prior art methods have disadvantages associated with them. All of the methods described above, except for the last one, which uses methods of inverse filtering, do not determine the directional acoustic transfer functions of the microphone array as it would be recorded under anechoic sound conditions. All of the methods described above, except for the last one, do not incorporate the directional acoustic transfer functions of the microphone array into a method for correcting or determining the directions of the recorded sound. All of the methods described above do not utilize the head-related transfer functions of the individual listener to modify the recorded sound so that it perceptually optimized for the individual listener. The importance of the last point is critical for this application. Each and every listener has external ears that acoustically filter the sound field in a manner that is slightly different than any other listener's external ears. Psychoacoustic research has shown that these small differences are perceptually discernable to human listeners. Thus, this patent describes an invention that takes these individual differences into consideration and modifies the recorded sound for the individual listener to improve the perceptual fidelity of the match between the original and reproduced sounds. In summary, all of the methods described above do not attempt to individualize the sound recording and generation process for the individual listener.
Several terms related to this invention are defined here.
A microphone mount refers to a physical structure that can support or “mount” several microphones.
A microphone array consists of several microphones that are supported in a microphone mount together with the microphone mount itself In addition, a microphone array may consist of several separate microphone mounts and their corresponding microphones. The collective structure would still be referred to as a microphone array.
A directional acoustic receiver is an acoustic recording device (such as a microphone) that has directional acoustic properties. That is to say, the acoustic impulse response of the acoustic recording device varies with the direction in space of the sound source with respect to the acoustic recording device. A typical example of a directional acoustic receiver is a microphone that has directional properties that arise from two contributions: (i) the microphone itself may have directional properties (e.g., a hypercardiod microphone) and (ii) physical structures near the microphone will acoustically filter the incoming sound (e.g., by acoustic refraction and diffraction) in a manner that depends on the direction of the sound source relative to the microphone. Another example of a directional acoustic receiver is the human external ear. In this case, the directional acoustic properties arise from the acoustic filtering properties of the external ear.
A directional acoustic transfer function refers to the impulse response and/or frequency response of a directional acoustic receiver; the impulse response and/or frequency response describe the pressure transformation from a location in space to the directional acoustic receiver. Generally, there is a directional acoustic transfer function for each direction and/or location in space relative to the directional acoustic receiver. In addition, the directional acoustic transfer function will depend on the environment (walls, tables, people, empty space, etc.) that surrounds the directional acoustic receiver. The term directional acoustic transfer function may refer to an acoustic transfer function recorded in any environment Often, however, the term directional acoustic transfer function refers to an impulse response and/or frequency response measured in the free-field (i.e., anechoic sound condition with no echoes).
A directional microphone array is defined as a microphone array in which some of the individual microphones in the microphone array are directional acoustic receivers. The group of microphones (in the microphone array) that are directional acoustic receivers may collectively describe the directional properties of the sound field (e.g., the incoming direction of acoustic energy in a given frequency band).
Primary microphones refer to directional acoustic receivers (microphones) that form part of a directional microphone array. The primary microphones are typically selected on the basis of specific signal processing issues related to the recording and reproduction of three-dimensional sound. As an example, the primary microphones may be microphones that correspond in some way to the hypothetical external ears of an individual listener.
Secondary microphones refer to directional acoustic receivers (microphones) that form part of a directional microphone array. The secondary microphones generally form a collective set of directional acoustic receivers whose recorded signals characterize the directional aspects of a recorded sound field. For example, the secondary microphones of the directional microphone array may be used collectively to determine the incoming direction of the acoustic energy in narrow frequency bands above approximately 1 kHz and up to the high-frequency limit of human hearing, e.g., 16 to 20 kHz.
A pair of source and target directional acoustic receivers refers to two directional acoustic receivers with a specific and defined geometrical arrangement in space. The geometrical relationship can be hypothetical or can correspond to a real physical structure. The geometrical relationship ensures that once the location and orientation of the source directional acoustic receiver is defined, then the location and orientation of the target directional acoustic receiver is also defined. Generally, the pair of source and target directional acoustic receivers will also have a specific and defined geometrical relationship to a directional microphone array. Therefore, it is typically the case that the pair of source and target directional acoustic receivers together with a directional microphone array are positioned, either hypothetically or in reality, in a sound field such that their geometrical relationship is defined. It may also be the case that either or both of the source and target directional acoustic receivers form a part of the directional microphone array. In any of the above cases, the primary point is that all three objects (the source and target directional acoustic receivers and the directional microphone array) have a defined geometrical relationship to each other. The geometrical arrangement of the target directional acoustic receiver with respect to the source directional acoustic receiver and also with respect to the directional microphone array may vary with time. Nonetheless, for any given short time window, the geometrical arrangement of the target directional acoustic receiver with respect to the source directional acoustic receiver is fixed. The manner in which the pair of source and target directional acoustic receivers is used forms an integral part of their definition, therefore, a brief description is given of their method of use. Generally, the source directional acoustic receiver and the directional microphone array are used to simultaneously record a three-dimensional sound field. The signal recorded by the source directional acoustic receiver is referred to as the recorded source signal. Generally, the recorded source signal is then modified or transformed using the information provided by the sound signals recorded by the directional microphone array. Generally, the objective of the signal transformation is to generate a signal that matches (hypothetically or in reality) the signal that would have been recorded by the target directional acoustic receiver, were the target directional acoustic receiver present in the original sound field and recording simultaneously with the source directional acoustic receiver.
The recorded source signal refers to a signal recorded by the source directional acoustic receiver as defined above.
A directional acoustic receiving array is identified as a separate object from a directional microphone array. A directional acoustic receiving array refers to a subset of the microphones of the directional microphone array. The directional acoustic receiving array is primarily used to determine the sound corresponding to a single direction in space, whereas the directional microphone array is used to determine the sound for every direction in space. By using a subset of the microphones of the directional microphone array as a directional acoustic receiving array and applying methods that are standard in the art of acoustic beam-forming, the directional information derived from the secondary microphones can be improved.
High frequency and low frequency sub-bands of acoustic signals relating to three dimensional audio refer to the frequency division in which the spectral and timing cues, respectively, of the external ears of the listener plays an important role in the human sound externalisation and localization of the acoustic signal. Low frequency sub-bands refer to the frequency bands in which acoustic timing cues are important for human sound externalisation and localisation. High frequency sub-bands refer to the frequency bands in which spectral cues are important for human sound externalisation and localisation. Nominally, the low frequency sub-bands are frequency bands below approximately 5 kHz and the high frequency sub-bands are frequency bands above approximately 5 kHz.