The present invention relates to headphones and in particular to headphones for reproducing a complete audio scene.
Typically, audio scenes are recorded by using a set of microphones. Each microphone outputs a microphone signal. In an orchestra, for example, 25 microphones are used. Then, an audio engineer carries out a mixture of the 25 microphone output signals, typically into a standardized format, such as a stereo format, a 5.1 format, a 7.1 format, a 7.2 format etc. In a stereo format, the audio engineer or an automatic mixing process generates two stereo channels. For a 5.1 format, mixing results in five channels and one subwoofer channel. Analogously, for example in a 7.2 format, mixing results in seven channels and two subwoofer channels.
When the audio scene is reproduced in a reproduction environment, the mixing result is applied to electrodynamic loudspeakers. In a stereo reproduction system, two loudspeakers exist, wherein the first loudspeaker receives the first stereo channel and the second loudspeaker receives the second stereo channel. In a 7.2 reproduction system, seven loudspeakers exist at predetermined positions and two subwoofers. The seven channels are applied to the respective loudspeakers and the two subwoofer channels are applied to the respective subwoofers.
Above that, there is also headphones reproduction, wherein different approaches exist. Typically, two channels are generated for headphones reproduction, namely a left stereo channel and a right stereo channel, wherein the left stereo channel is reproduced via the left earpiece of the headphones and the right stereo channel via the right earpiece of the headphones. Alternatively, in order to improve spatial perception, binaural processings are performed, wherein by using so-called head-related transfer functions (HRTFs) or binaural room impulse responses (BRIRs), the stereo channels are preprocessed, such that the headphones user does not only have a stereo experience but also a spatial experience.
The usage of a single microphone system on the detection side and a single converter array in headphones on the reproduction side typically neglect the true nature of sound sources. For example, acoustic musical instruments and the human voice are to be differentiated according to how sound is generated and what the emission characteristics are like. Trumpets, trombones, horns and other wind instruments, for example, have strongly directed sound emission. Thus, these instruments emit in an advantageous direction and thus have a high directivity or high quality.
On the other hand, violins, cellos, double basses, guitars, grand pianos, pianos, gongs and similar acoustic musical instruments have a comparatively small directivity or a respective small emission quality factor Q. These instruments use so-called acoustic short circuits when sound is generated. An acoustic short circuit is generated by communication between front and rear of the respective vibrating area or surface.
The human voice generates an average Q factor. Here, the air connection between mouth and nose effects an acoustic short circuit.
String or bow instruments, xylophones, triangles, etc. generate, for example, sound energy in a frequency range up to 100 kHz and additionally have low emission directivity or a low emission quality factor. In particular the tone of a xylophone and a triangle is clearly identifiable, despite their low sound energy and despite their low quality factor, even within a loud orchestra.
Thus, it becomes clear that sound generation by acoustic instruments or other instruments and also by the human voice differs greatly.
When sound energy is generated, air molecules, for example diatomic or triatomic gas molecules are stimulated. There are three different mechanisms that are responsible for this stimulation. In this regard, reference is made to the German patent DE 198 19 452 C1. These three different mechanisms are illustrated in FIG. 5. The first mechanism is translation. Translation describes the linear movement of the air molecules or atoms with respect to the centroid of the molecule, shown at 70 in FIG. 5. The second mechanism is rotation where air molecules or atoms rotate around the centroid of the respective molecule, again indicated by 70. The third mechanism is vibration where the atoms or molecules reciprocate in a specific direction with respect to the centroid 70 of the molecules.
Thus, the sound energy generated by acoustic musical instruments and by the human voice consists of individual mixing ratios of translation, rotation and vibration.
Typically, merely translation is considered. In other words, this means that rotation and vibration are normally not considered during the complete description of the sound energy, which results in significantly perceptible sound quality losses.
On the other hand, the complete sound intensity is defined by a sum of the intensities originating from translation, rotation and vibration.
Above that, different sound sources have different sound emission characteristics. The sound emission generated by musical instruments and generated by the voice generates a sound field, and this sound field reaches the listener via two paths. The first path is the direct sound, where the direct sound portion of the sound field allows exact positioning of the sound source. The second component is the spatial emission. Sound energy emitted in all spatial directions generates a specific sound of instruments or a group of instruments, since this spatial emission cooperates with the room by attenuations, reflections, etc. A specific connection between direct sound and spatially emitted sound is characteristic of all musical instruments and human voice.
WO 2012/120985 A1 discloses a method and an apparatus for detecting and reproducing an audio scene, where sound is detected with a first directivity by microphones arranged between the audio scene and the potential listener. Further, a second detection signal is detected with lower directivity by microphones arranged above or on the side of the audio scene. These two detection signals are separately mixed and processed but are not combined. On the reproduction side, the signals are then output by loudspeaker systems, such as a loudspeaker system in a standard format, where a loudspeaker system comprising both omnidirectional loudspeakers and directional loudspeakers is arranged at each predetermined position of the standard format.
Hereby, it is ensured that the listener can perceive the optimum audio quality, since not only translation and vibration are generated in the reproduction space, but also rotation, which is extremely important for the particular high quality sound perception.