The term “mixing” or simply “mix” refers to a set of processing operations of audio signals, performed by software or by a device, at the end of which all the signals are mixed to obtain a unified sound in regulating the sound level, the tone, the spatialization and other sound characteristics. In general, this sound is made up of several signals and broadcast on several speakers distributed in auditory space (or headset), in order to create an image of a sound scene where one can perceive localized sound sources in angle and in depth (i.e. the “stereo”, in the broad sense). The stage of “mixing”, conducted by example in a recording studio, is an integral part in the production of music, films, radio and television.
In a classic design f its sound scene, the capture of it consists in the use of a primary microphone system which provides a sound image of the overall scene while providing the “color” and the “volume” of the space. Often, each microphone system picks up a signal which is then reconstructed on a dedicated loudspeaker. The resulting sound image and its location depend on amplitude and/or phase differences between the signals broadcast by the different speakers. To improve the perceived quality of the important acoustic sources, the sound recorder of spot microphones, arranged in close proximity to the sources in question.
The capture of the more widespread sound field is based on a sound recording using microphonic pairs for a stereophonic reproduction on two speakers. The principles of such a capture date back to the 1930s. The evolution of the systems of reproduction towards a greater number of speakers (quadriphonic, multichannel) to add an immersive dimension, led to the creation of new systems of sound recording, rational and able to operate immediately with several channels.
We have today microphonic systems composed of several capsules arranged to capture the sound field in several dimensions (typically 2 or 3) according to a so-called “ambisonic” technology. The “ambisonic” technology is for example described in the article by M. A. Gerzon, entitled “Periphony: With-Height Sound Reproduction and published in the journal J. Audio Eng. Soc., vol. 21, no. 1, pp. 2-10, in February 1973.
The ambisonic approach is to represent the characteristics of a sound field from spherical first order harmonics in a point which corresponds to the position of the microphone, and which will, when reproduced, to the position of the listener. The order-1 of this format describes the sound field using the four components which contain spatial information (azimuth, elevation) as well as the sound characteristics such as:                the height, which perceive a sound as more or less high-pitched;        the duration, the time of resonance and maintenance of a sound;        the intensity, the volume, the strength of a sound;        the tone, the “color” of a sound.        
In relation to FIG. 1A, every point of the Euclidean space in three dimensions is defined with the 3 following parameters:                azimuth θ        elevation φ        radius rThe Cartesian coordinates of a point in the space (x, y, z) are expressed from spherical coordinates (r, θ, φ) in the following manner:        
                    {                                                            x                =                                                      r                    ·                    cos                                    ⁢                                                                          ⁢                                      θ                    ·                    cos                                    ⁢                                                                          ⁢                  φ                                                                                                        y                =                                                      r                    ·                    sin                                    ⁢                                                                          ⁢                                      θ                    ·                    cos                                    ⁢                                                                          ⁢                  φ                                                                                                        z                =                                                      r                    ·                    sin                                    ⁢                                                                          ⁢                  φ                                                                                        (        1        )            
In relation to FIG. 1b, to capture the first order HOA, Michael Gerzon has proposed to use an omnidirectional microphone producing a so-called pressure component w, coupled to three bi-directional microphones, producing the components X, Y, z, which are oriented along orthogonal axes. The 3D sound space is then picked up by the combination of the “omni” microphone providing the corresponding signal to the component W) and bi-directional microphones providing the corresponding signals to the components X, Y, Z. The whole of the four components captured by this type of device is called Format B or in other words the order-1 of the HOA format for “Higher Order Ambisonic”. This HOA format is seen as a generalization of the ambisonic to superior sequences allowing to increase the spatial resolution of the sound field.
Other types of microphones exist, using alternative directivity capsules, and for which a mastering (of gains or filters) is necessary in order to obtain the ambisonic components. It should be noted that a minimum of 3 capsules in 2 dimensions, and 4 capsules in 3 dimensions is necessary. This is for example the case of the Soundfield® microphone which uses 4 quasi-coincident cardioid capsules and which allows to provide, after mastering, the 4 signals of format B, or again the case of the microphone Eigenmike® which has 32 capsules distributed on a rigid sphere of 8.4 cm diameter and which allows provision of, after conversion, the 25 signals of the order-4 HOA format.
The spot microphones are generally of directive monophonic capsules, directed to the sources to capture, but it is possible to use a stereophonic microphone (or a couple). The advantage of a stereophonic microphone is that it allows you to capture a local sound space, for example the one formed by the different instruments of an orchestral podium of classical music while maintaining their relative positions, or even the “overhead” of drums (atmospheric microphones overhead of a drummer, which allows you to capture the relative positioning of the toms or cymbals).
Later in the document, we will restrict, without loss of generality, to the format B, that is to say order-1 HOA format and with monophonic spot microphones.
We consider an acoustic source whose position with respect to the origin is described by the vector unit {right arrow over (u)}(ux, uy, uz). Its 4 components according to format B is expressed in the following form:
                    {                                                            W                =                s                                                                                        X                =                                  η                  ·                  s                  ·                                      u                    x                                                                                                                          Y                =                                  η                  ·                  s                  ·                                      u                    y                                                                                                                          Z                =                                  η                  ·                  s                  ·                                      u                    z                                                                                                          (        2        )            
where η is a normalization factor introduced by Gerzon to retain the level of amplitude of each component.
The ambisonic technology adapts to different systems of reproduction, allows for manipulation of the sound field (rotation, focalization, . . . ) and is based on powerful mathematical formalism.
The combined use of “ambisonic” capsule microphones with spot microphones opens new possibilities for sound recording, but demands the production of new tools that allow to manipulate the sound field in the HOA format as well as to integrate in the mixing process all the acoustic sources captured by spot microphones.
A “plug-in” software device, marketed under the name of PanNoir by the company Merging Technologies, is able to perform a spatial positioning (or “pan-pot”) of spot microphones before mixing the acoustic signals that they have picked up with those of a primary two-capsule microphone. The user must manually adjust the distance (therefore the overall delay) and the relative position of spot microphones to the primary microphone, as well as the characteristics of the latter (spacing, orientation and directivity of 2 capsules), and the “plug-in” will simply calculate the delays and gains to apply to each spot capsule. In the case of a primary microphone coinciding, i.e. to collocated capsules and a monophonic spot, the delay is not calculated automatically, but provided by the user.