Basic audio recording and reproduction technology is omnipresent and has become an integral part of our daily life. These areas cover the full sequence from recording to reproduction in, for example, music recording in studios, concert halls, etc. and subsequent reproduction in home entertainment systems, telephony/office communications, public address systems, etc. The ultimate goal is to reproduce sound exactly as it sounded when first created. In other words, we would like to be in the same acoustic field in which it was originally recorded, in terms of various factors, such as amplitude, frequency content and depth perception, spatial location, etc.
In recent times, steps have been taken towards virtual video/acoustic recording and reproduction in this area. Virtual acoustic recording and reproduction refer to the recording of real-world acoustic concerts/performances in reverberant spaces and their subsequent acceptable reproduction in a virtual version of the original performance space, known as a Virtual Auditory Environment (VAE). Auditory scenes can be created using two main mechanisms: recording or auralization. In the first approach, recording for scene synthesis is usually implemented in a studio environment. For example, in popular music production, a multi-track approach is taken, whereby instruments are layered temporally, spectrally and spatially. Another example is in cinema, where Foley artists create the auditory scene by adding in everyday sounds synchronized to the actor's movements. In the second approach, the creation of auditory scenes through auralization involves the processing of recorded audio (preferably anechoic) with acoustic responses taken in real rooms or computed with auralization software. Unlike virtual reality, which takes us all the way to a new reality, holograms and holographic sound are those which creates some 3-dimensional image in our “own reality”. Thus, holography and holographic sound is defined as a technique which enables three-dimensional sound processing by the brain (holograms) to be created such that the brain detects a directional source of the sound and can determine different directions from which the sound emanates, despite, in embodiments of the disclosed technology, two sounds emanating from the same location in space. Accordingly, virtual reality is a reality which has been created via a computer. Augmented reality is virtual reality but with bits added. Holography is a way of showing “pictures” (which includes an audio scene or “picture”) in which one can walk around. Although virtual holographic video devices are now widely available, virtual acoustic holographic recording and reproduction devices are still primitive at the time of this writing.
Further describing holography, this is a technique to record and reconstruct the complete information of wave fields. The word ‘holo-graph’ is derived from the Greek which means whole-drawing—which also describes the vast amount of information contained in a hologram. The basis of holography is spatial storage of the phase and/or amplitude profile of the desired wave-front, in a manner that allows that wave-front to be reconstructed by interference when the hologram is illuminated with a suitable coherent source. Optical holograms have been widely applied in virtual reality displays, data storage, sensing, and security printing.
Acoustic holograms, on the other hand, are relatively less advanced compared to their electromagnetic counterparts in terms of present applications. One major restricting factor is the limited acoustic properties that natural or traditional materials can offer. Acoustic holography is the process in which sound waves are recorded to tangible medium and arranged or reproduced in three dimensions using a processor. The sound field can be modeled to reconstruct its structure using three-dimensional (3D) images. The acoustic hologram could generate 3D sound fields about 100 times more detailed than ones produced by other techniques. To date, most acoustic holographic reconstruction techniques rely on phased arrays with large numbers of active elements, requiring sophisticated phase shifting circuits, large power consumption and careful calibration and tuning. Measuring techniques included in acoustic holography are becoming increasingly popular in various fields. The best-known techniques are based on Near-field Acoustic Holography (NAH). Nearfield-acoustic holography is a method for estimating the sound field near a source by measuring acoustic parameters away from the source by means of an array of pressure and/or particle velocity transducers. Near-field acoustic holography makes it possible to reconstruct the three-dimensional sound field between the source and the measurement plane.
Holographic techniques are fundamental to applications such as volumetric displays, high-density data storage and optical tweezers that require spatial control of intricate optical or acoustic fields, within a three-dimensional volume. A variety of sound field reproduction methods have been proposed such as Ambisonics, Wave Field Synthesis, methods based on the solution of an inverse problem and other techniques. Recently, NAH (Near-field Acoustic Holography) has also been considered for holographic systems. The major advantage of NAH is that it enables reconstruction of all acoustic quantities such as the acoustic pressure, particle velocity and acoustic intensity not only at a measurement location, but in 3D space and on a source surface by measuring the acoustic pressure in the near-field of the target source surface. NAH system includes a spherical array of a plurality of microphones, an analog to digital converter for digitizing pressure data from each microphone, and a processor for determining the acoustic intensity at each location, the processor having computer software adapted to apply a regularization filter to spherical Wave equations for pressure and velocity. Overall, NAH requires a large number of recording and reproducing sensors—as large as 51 microphones and 51 speakers. The need for multiple transducers and system complexity are the major disadvantages of the NAH approach. Furthermore, acoustical holography is still limited by the Nyquist sampling theorem. To avoid spatial aliasing problems, the array microphone spacing must be somewhat less than half of the acoustic wavelength, which sets a serious limitation on the upper frequency. Also, the resulting Nyquist rate is so high that a very high number of samples must be used. The combination of the large amount of hardware (speakers, microphones) and large amount of processing makes such systems cost prohibitive and difficult to implement.
Other simpler methods are limited to biaural sound. Sound is a vibration that propagates as an audible mechanical wave of pressure and displacement through a medium such as air and water. In human hearing terms, sound is the reception of such waves and their perception by the brain. Many theorists earlier believed that only one ear was essential for correct hearing, but it has since been proven that two ears are essential for binaural hearing, and therefore, our understanding of the world around us. In fact, the word “binaural” literally just means “using both ears.” The brain's hearing system is binaural, and these methods include relative phase shift for low frequency sounds, relative intensity for sounds in the voice range, and relative time of arrival for sounds having fast rise times and high frequency components. Binaural recording is a method of recording sound that uses two microphones, arranged with the intent to create a 3-D stereo sensation for the listener of actually “being in the room” with the performers or instruments. Binaural sound is usually recorded with two microphones spaced as if they were in place of your ears, sometimes actually in a “Kunstkopf” (dummy head), where the microphones are actually placed where your ear canals would be. The result, when using good headphones for playback, is a realistic sense of the space where the recording was made, and often an uncanny sense of the movement of instruments or voices around that space, even sometimes seeming to come from above or behind you. However, sound reproduction through headphones often leads to ‘in-head localization’ such that good assessment of spatial cues becomes impossible.
On the other hand, normally mixed and panned multi-microphone studio recordings intended for loudspeaker reproduction often use individual microphones on each instrument and are panned on the mixing console to some location from far left to far right, with voices and often drums placed dead center, and other instruments moved left and right in the artificial image. When listened to such recordings using headphones, the image often seems to be in the middle of your head rather than in the original recording space. There have been various headphone and audio processor designs made to compensate for this “inside the head” perception over the years. It has been observed that while binaural recordings sound their best on headphones, recordings mixed from multiple tracks on studio loudspeakers usually sound their best reproduced on loudspeakers.
Human hearing is three-dimensional. We can distinguish the direction and, to some degree, distance of a sound source. In fact, there's a wealth of information in the sounds that reach our ears, and our brains do some very sophisticated processing of that information. The cochlea, and actually the whole ear, is designed to convert sounds into nerve signals and convey sound information to the brain. The cochlea of the inner ear is the most critical structure in the auditory pathway, for it is there that the energy from acoustically generated pressure waves is transformed into neural impulses. The cochlea not only amplifies sound waves and converts them into neural signals, but it also acts as a mechanical frequency analyzer, decomposing complex acoustical waveforms into simpler elements. The human cochlea is capable of exceptional sound analysis, in terms of both frequency and intensity. The cochlea allows the perception of sounds between 20 Hz and 20000 Hz (nearly 10 octaves), with a resolution of 1/230 octave (from 3 Hz at 1000 Hz). At 1000 Hz, the cochlea encodes acoustic pressures between 0 dB SPL (2×10−5 Pa) and 120 dB SPL (20 Pa).
The cochlea is a hydro-mechanical frequency analyzer located in the inner ear. Its principal role is to perform a real-time spectral decomposition of the acoustic signal in producing a spatial-frequency map. The cochlea uses a frequency-to-space transformation to perform audio spectral analysis. Upon impingement of an acoustic signal onto the fluid-filled cochlea, the basilar membrane undergoes an oscillatory motion at the frequency of the sound, resulting in a wave traveling toward its distal end. The wave is spatially confined along the length of the basilar membrane, and the location of its maximum amplitude is related to the frequency of the sound. The higher the frequency, the more restricted the disturbance to the proximal end. Understanding of frequency analysis in the inner ear progressed through three main periods. The first was dominated by Helmholtz's suggestions that lightly damped, spatially ordered, mechanically resonant elements in the cochlea perform the spectral analysis. The second period, lasting from the late 1940s to the early 1970s was dominated by von Bekesy's description of the traveling wave. The third epoch during which a fundamentally different paradigm has emerged. According to this paradigm, von Bekesy's traveling wave is boosted by a local electromechanical amplification process in which one of the ear's sensory cell groups, outer hair cells, function as both sensors and mechanical feedback elements. This discovery helped to explain the cochlea's frequency selectivity. The differences between Bekesy and Johnston's observations were due to active biological mechanisms that act upon the vibration of the basilar membrane in living subjects.
In the cochlea, the basilar membrane interacts with the fluid, constrained by the shape of the channel, to make a transmission line that supports mechanical traveling waves. Positions along this transmission line correspond to a large number of outputs, with a progression of different frequency responses, analogous to the old Helmholtz resonance view of cochlear function.
For a pure tone sound, active mechanics amplify basilar membrane vibrations by around +50 dB at a very narrow section of the organ of Corti, which serves to increase the sensitivity of the cochlea at this site. Two similar frequencies can therefore activate two distinct cochlear regions, allowing them to be differentiated (a characteristic known as frequency selectivity). This frequency tuning is closely linked to the electro-motility of the outer hair cells (OHCs), and is defined by the fibers of the auditory nerve and the inner hair cells (IHCs) that generate the neural signal.
One of the most significant nonlinear behaviors of the cochlea is high sound-level compression. Sound signals at low intensities are amplified in a frequency-selective manner at certain cochlear position, where the cochlea exhibits large gain, while high-level sound signals are barely amplified, where the cochlea exhibits small gain. The auditory system utilizes a unique method of real-time spectral decomposition along with place theory to attain an impressive auditory range while maintaining real-time processing capabilities. It is able to achieve this by acting as a hydro-mechanical frequency analyzer, as well as using compressive techniques to efficiently transmit data. The inspiring functionality of the basilar membrane is its ability to perform real-time spectral decomposition. Activation of sub-sections of the basilar membrane results in sinusoidal vibrations of varying amplitude and phase, depending on the content of the input signal. Thus, in the inner ear a transformation takes place that maps frequency to location. This mechanism is fundamental for the frequency discrimination of the ear. The location on the basilar membrane for maximal amplitude can be described by:
            f      =                        165.4          ⁢                                          ⁢                                                    10                                  0.06                  ⁢                  x                                            ¿                        ¿                          -        1              )              x      =                        1          0.06                ⁢                  log          ⁡                      (                                          f                +                165.4                            165.4                        )                                ,  
where:
f: frequency in [Hz]
x: position of maximum excursion of the basilar membrane in [mm].
The frequency-dependent filtering mechanism of the human cochlea system thus takes us to the spatial-frequency dependent design using dispersive acoustic meta material (AMM) systems. As such, the basilar membrane has often been compared to a bank of band-pass filters (BPFs) that simultaneously decompose a convoluted signal into its frequency components. A number of acousticians today think that the most realistic model of basilar membrane function is the resonator system, or, even better, a system of frequency-tuned oscillators that can be regulated by the central nervous system (known as efferent feedback).
Musical audio signals contain a large amount of underlying structure, due to the process through which music is generated. Human hearing is usually very good at analyzing the structure of audio signals, a process known as auditory scene analysis. For music, it is not surprising that a musical audio signal would be generated from a small number of possible notes active at any one time, and hence allow a sparse representation. Compressed sensing (CS) seeks to represent a signal using a number of linear, non-adaptive measurements. Usually the number of measurements is much lower than the number of samples needed if the signal is sampled at the Nyquist rate. CS requires that the signal is sparse in some basis—in the sense that it is a linear combination of a small number of basis functions—in order to correctly reconstruct the original signal. Clearly, the sinusoidally-modeled part of an audio signal is a sparse signal, and it is thus natural to use CS to encode such a signals. Due to its universality and lack of complexity on the sensor side, CS is an attractive compression scheme for multi-sensor systems. Recently, sparseness of audio signal has been exploited with the aim of achieving even higher compression ratio than the current compression techniques used in the multimedia coding standards.
It is known that an impedance-matched surface has the property that incident wave generates no reflection. A perfect acoustic absorber of deep-subwavelength scale is of great scientific and engineering interest. It can act as the exact time-reversed counterpart of a point source, with important implications for time-reversal wave technology. Traditional means of acoustic absorption make use of porous and fibrous materials and gradient index materials, or employ perforated or micro-perforated panels with tuned cavity depth behind the panels. They generally result in either imperfect impedance matching to the incoming wave, or very bulky structures with dimensions comparable to the wavelength. Active ‘absorbers’, on the other hand, require costly and sophisticated electrical designs. Recently, it was shown that, for electromagnetic waves, structuring the interface between two different materials can lead to meta surfaces with diverse functionalities such as phase discontinuity, anomalous refraction/reflection, and polarization manipulation. Acoustic meta material based systems not only can record with fewer sensors but reproduce the sound with less speakers. By exploiting acoustic meta materials and compressive sensing, a holographic recording device with fewer sensors that separates simultaneous overlapping sounds from different sources and a speaker array which can reproduce the holographic sound, a complete virtual acoustic holographic system is designed and presented. Anisotropic acoustic meta materials can be designed to have strong wave compression effect that renders direct amplification of pressure fields in meta materials.
Thus, what is needed is a way to accurately reproduce holographic sound which is less expensive and better quality than what is currently known in the art.