1. Field of the Invention
The invention relates to an audio processing system and particularly a real-time processing system allowing processing of ambient and supplemental audio content according to desired specifications.
2. Description of the Related Technology
WO 2016/090342 A2, published Jun. 9, 2016, the disclosure of which is expressly incorporated herein and which was made by the inventor of subject matter described herein, shows an adaptive audio spatialization system having an audio sensor array rigidly mounted to a personal speaker.
It is known to use microphone arrays and beamforming technology in order to locate and isolate an audio source. Personal audio is typically delivered to a user by a personal speaker(s) such as headphones or earphones. Headphones are a pair of small speakers that are designed to be held in place close to a user's ears. They may be electroacoustic transducers which convert an electrical signal to a corresponding sound in the user's ear. Headphones are designed to allow a single user to listen to an audio source privately, in contrast to a loudspeaker which emits sound into the open air, allowing anyone nearby to listen. Earbuds or earphones are in-ear versions of headphones.
A sensitive transducer element of a microphone is called its element or capsule. Except in thermophone based microphones, sound is first converted to mechanical motion [by] a diaphragm, the motion of which is then converted to an electrical signal. A complete microphone also includes a housing, some means of bringing the signal from the element to other equipment, and often an electronic circuit to adapt the output of the capsule to the equipment being driven. A wireless microphone contains a radio transmitter.
The MEMS (MicroElectrical-Mechanical System) microphone is also called a microphone chip or silicon microphone. A pressure-sensitive diaphragm is etched directly into a silicon wafer by MEMS processing techniques, and is usually accompanied with integrated preamplifier. Most MEMS microphones are variants of the condenser microphone design. Digital MEMS microphones have built in analog-to-digital converter (ADC) circuits on the same CMOS chip making the chip a digital microphone and so more readily integrated with modern digital products. Major manufacturers producing MEMS silicon microphones are Wolfson Microelectronics (WM7xxx), Analog Devices, Akustica (AKU200x), Infineon (SMM310 product), Knowles Electronics, Memstech (MSMx), NXP Semiconductors, Sonion MEMS, Vesper, AAC Acoustic Technologies, and Omron.
A microphone's directionality or polar pattern indicates how sensitive it is to sounds arriving at different angles about its central axis. The polar pattern represents the locus of points that produce the same signal level output in the microphone if a given sound pressure level (SPL) is generated from that point. How the physical body of the microphone is oriented relative to the diagrams depends on the microphone design. Large-membrane microphones are often known as “side fire” or “side address” on the basis of the sideward orientation of their directionality. Small diaphragm microphones are commonly known as “end fire” or “top/end address” on the basis of the orientation of their directionality.
Some microphone designs combine several principles in creating the desired polar pattern. This ranges from shielding (meaning diffraction/dissipation/absorption) by the housing itself to electronically combining dual membranes.
An omni-directional (or non-directional) microphone's response is generally considered to be a perfect sphere in three dimensions. In the real world, this is not the case. As with directional microphones, the polar pattern for an “omni-directional” microphone is a function of frequency. The body of the microphone is not infinitely small and, as a consequence, it tends to get in its own way with respect to sounds arriving from the rear, causing a slight flattening of the polar response. This flattening increases as the diameter of the microphone (assuming it's cylindrical) reaches the wavelength of the frequency in question.
A unidirectional microphone is sensitive to sounds from only one direction.
A noise-canceling microphone is a highly directional design intended for noisy environments. One such use is in aircraft cockpits where they are normally installed as boom microphones on headsets. Another use is in live event support on loud concert stages for vocalists involved with live performances. Many noise-canceling microphones combine signals received from two diaphragms that are in opposite electrical polarity or are processed electronically. In dual diaphragm designs, the main diaphragm is mounted closest to the intended source and the second is positioned farther away from the source so that it can pick up environmental sounds to be subtracted from the main diaphragm's signal. After the two signals have been combined, sounds other than the intended source are greatly reduced, substantially increasing intelligibility. Other noise-canceling designs use one diaphragm that is affected by ports open to the sides and rear of the microphone.
Sensitivity indicates how well the microphone converts acoustic pressure to output voltage. A high sensitivity microphone creates more voltage and so needs less amplification at the mixer or recording device. This is a practical concern but is not directly an indication of the microphone's quality, and in fact the term sensitivity is something of a misnomer, “transduction gain” being perhaps more meaningful, (or just “output level”) because true sensitivity is generally set by the noise floor, and too much “sensitivity” in terms of output level compromises the clipping level.
A microphone array is any number of microphones operating in tandem. Microphone arrays may be used in systems for extracting voice input from ambient noise (notably telephones, speech recognition systems, hearing aids), surround sound and related technologies, binaural recording, locating objects by sound: acoustic source localization, e.g., military use to locate the source(s) of artillery fire, aircraft location and tracking.
Typically, an array is made up of omni-directional microphones, directional microphones, or a mix of omni-directional and directional microphones distributed about the perimeter of a space, linked to a computer that records and interprets the results into a coherent form. Arrays may also have one or more microphones in an interior area encompassed by the perimeter. Arrays may also be formed using numbers of very closely spaced microphones. Given a fixed physical relationship in space between the different individual microphone transducer array elements, simultaneous DSP (digital signal processor) processing of the signals from each of the individual microphone array elements can create one or more “virtual” microphones.
Beamforming or spatial filtering is a signal processing technique used in sensor arrays for directional signal transmission or reception. This is achieved by combining elements in a phased array in such a way that signals at particular angles experience constructive interference while others experience destructive interference. A phased array is an array of antennas, microphones, or other sensors in which the relative phases of respective signals are set in such a way that the effective radiation pattern is reinforced in a desired direction and suppressed in undesired directions. The phase relationship may be adjusted for beam steering. Beamforming can be used at both the transmitting and receiving ends in order to achieve spatial selectivity. The improvement compared with omni-directional reception/transmission is known as the receive/transmit gain (or loss).
Adaptive beamforming is used to detect and estimate a signal-of-interest at the output of a sensor array by means of optimal (e.g., least-squares) spatial filtering and interference rejection.
To change the directionality of the array when transmitting, a beamformer controls the phase and relative amplitude of the signal at each transmitter, in order to create a pattern of constructive and destructive interference in the wavefront. When receiving, information from different sensors is combined in a way where the expected pattern of radiation is preferentially observed.
With narrow-band systems the time delay is equivalent to a “phase shift”, so in the case of a sensor array, each sensor output is shifted a slightly different amount. This is called a phased array. A narrow band system, typical of radars or wide microphone arrays, is one where the bandwidth is only a small fraction of the center frequency. With wide band systems this approximation no longer holds, which is typical in sonars.
In the receive beamformer the signal from each sensor may be amplified by a different “weight.” Different weighting patterns (e.g., Dolph-Chebyshev) can be used to achieve the desired sensitivity patterns. A main lobe is produced together with nulls and sidelobes. As well as controlling the main lobe width (the beam) and the sidelobe levels, the position of a null can be controlled. This is useful to ignore noise or jammers in one particular direction, while listening for events in other directions. A similar result can be obtained on transmission.
Beamforming techniques can be broadly divided into two categories: i) conventional (fixed or switched beam) beamformers; and ii) adaptive beamformers or phased array, which typically operate in a desired signal maximization mode or an interference signal minimization or cancellation mode
Conventional beamformers use a fixed set of weightings and time-delays (or phasings) to combine the signals from the sensors in the array, primarily using only information about the location of the sensors in space and the wave directions of interest. In contrast, adaptive beamforming techniques generally combine this information with properties of the signals actually received by the array, typically to improve rejection of unwanted signals from other directions. This process may be carried out in either the time or the frequency domain.
As the name indicates, an adaptive beamformer is able to automatically adapt its response to different situations. Some criterion has to be set up to allow the adaption to proceed such as minimizing the total noise output. Because of the variation of noise with frequency, in wide band systems it may be desirable to carry out the process in the frequency domain.
Beamforming can be computationally intensive.
Beamforming can be used to try to extract sound sources in a room, such as multiple speakers in the cocktail party problem. This requires the locations of the speakers to be known in advance, for example by using the time of arrival from the sources to mics in the array, and inferring the locations from the distances.
A Primer on Digital Beamforming by Toby Haynes, Mar. 26, 1998 http://www.spectrumsignal.com/publications/beamform_primer.pdf describes beam forming technology.
According to U.S. Pat. No. 5,581,620, the disclosure of which is incorporated by reference herein, many communication systems, such as radar systems, sonar systems and microphone arrays, use beamforming to enhance the reception of signals. In contrast to conventional communication systems that do not discriminate between signals based on the position of the signal source, beamforming systems are characterized by the capability of enhancing the reception of signals generated from sources at specific locations relative to the system.
Generally, beamforming systems include an array of spatially distributed sensor elements, such as antennas, sonar phones or microphones, and a data processing system for combining signals detected by the array. The data processor combines the signals to enhance the reception of signals from sources located at select locations relative to the sensor elements. Essentially, the data processor “aims” the sensor array in the direction of the signal source. For example, a linear microphone array uses two or more microphones to pick up the voice of a talker. Because one microphone is closer to the talker than the other microphone, there is a slight time delay between the two microphones. The data processor adds a time delay to the nearest microphone to coordinate these two microphones. By compensating for this time delay, the beamforming system enhances the reception of signals from the direction of the talker, and essentially aims the microphones at the talker.
A beamforming apparatus may connect to an array of sensors, e.g. microphones that can detect signals generated from a signal source, such as the voice of a talker. The sensors can be spatially distributed in a linear, a two-dimensional array or a three-dimensional array, with a uniform or non-uniform spacing between sensors. A linear array is useful for an application where the sensor array is mounted on a wall or a podium talker is then free to move about a half-plane with an edge defined by the location of the array. Each sensor detects the voice audio signals of the talker and generates electrical response signals that represent these audio signals. An adaptive beamforming apparatus provides a signal processor that can dynamically determine the relative time delay between each of the audio signals detected by the sensors. Further, a signal processor may include a phase alignment element that uses the time delays to align the frequency components of the audio signals. The signal processor has a summation element that adds together the aligned audio signals to increase the quality of the desired audio source while simultaneously attenuating sources having different delays relative to the sensor array. Because the relative time delays for a signal relate to the position of the signal source relative to the sensor array, the beamforming apparatus provides, in one aspect, a system that “aims” the sensor array at the talker to enhance the reception of signals generated at the location of the talker and to diminish the energy of signals generated at locations different from that of the desired talker's location. The practical application of a linear array is limited to situations which are either in a half plane or where knowledge of the direction to the source in not critical. The addition of a third sensor that is not co-linear with the first two sensors is sufficient to define a planar direction, also known as azimuth. Three sensors do not provide sufficient information to determine elevation of a signal source. At least a fourth sensor, not co-planar with the first three sensors is required to obtain sufficient information to determine a location in a three dimensional space.
Although these systems work well if the position of the signal source is precisely known, the effectiveness of these systems drops off dramatically and computational resources required increases dramatically with slight errors in the estimated a priori information. For instance, in some systems with source-location schemes, it has been shown that the data processor must know the location of the source within a few centimeters to enhance the reception of signals. Therefore, these systems require precise knowledge of the position of the source, and precise knowledge of the position of the sensors. As a consequence, these systems require both that the sensor elements in the array have a known and static spatial distribution and that the signal source remains stationary relative to the sensor array. Furthermore, these beamforming systems require a first step for determining the talker position and a second step for aiming the sensor array based on the expected position of the talker.
A change in the position and orientation of the sensor can result in the aforementioned dramatic effects even if the talker is not moving due to the change in relative position and orientation due to movement of the arrays. Knowledge of any change in the location and orientation of the array can compensate for the increase in computational resources and decrease in effectiveness of the location determination and sound isolation.
U.S. Pat. No. 7,415,117 shows audio source location identification and isolation. Known systems rely on stationary microphone arrays.
A position sensor is any device that permits position measurement. It can either be an absolute position sensor or a relative one. Position sensors can be linear, angular, or multi-axis. Examples of position sensors include: capacitive transducer, capacitive displacement sensor, eddy-current sensor, ultrasonic sensor, grating sensor, Hall effect sensor, inductive non-contact position sensors, laser Doppler vibrometer (optical), linear variable differential transformer (LVDT), multi-axis displacement transducer, photodiode array, piezo-electric transducer (piezo-electric), potentiometer, proximity sensor (optical), rotary encoder (angular), seismic displacement pick-up, and string potentiometer (also known as string potentiometer, string encoder, cable position transducer). Inertial position sensors are common in modern electronic devices.
A gyroscope is a device used for measurement of angular velocity. Gyroscopes are available that can measure rotational velocity in 1, 2, or 3 directions. 3-axis gyroscopes are often implemented with a 3-axis accelerometer to provide a full 6 degree-of-freedom (DoF) motion tracking system. A gyroscopic sensor is a type of inertial position sensor that senses rate of rotational acceleration and may indicate roll, pitch, and yaw.
An accelerometer is another common inertial position sensor. An accelerometer may measure proper acceleration, which is the acceleration it experiences relative to freefall and is the acceleration felt by people and objects. Accelerometers are available that can measure acceleration in one, two, or three orthogonal axes. The acceleration measurement has a variety of uses. The sensor can be implemented in a system that detects velocity, position, shock, vibration, or the acceleration of gravity to determine orientation. An accelerometer having two orthogonal sensors is capable of sensing pitch and roll. This is useful in capturing head movements. A third orthogonal sensor may be added to obtain orientation in three dimensional space. This is appropriate for the detection of pen angles, etc. The sensing capabilities of an inertial position sensor can detect changes in six degrees of spatial measurement freedom by the addition of three orthogonal gyroscopes to a three axis accelerometer.
Magnetometers, sometimes referred to as magnometers, are devices that measure the strength and/or direction of a magnetic field. Because magnetic fields are defined by containing both a strength and direction (vector fields), magnetometers that measure just the strength or direction are called scalar magnetometers, while those that measure both are called vector magnetometers. Today, both scalar and vector magnetometers are commonly found in consumer electronics, such as tablets and cellular devices. In most cases, magnetometers are used to obtain directional information in three dimensions by being paired with accelerometers and gyroscopes. This device is called an inertial measurement unit “IMU” or a 9-axis position sensor.
Advancements in hearing aid technology have resulted in numerous developments which have served to improve the listening experience for people with hearing impairments, but these developments have been fundamentally limited by an overriding need to minimize size and maximize invisibility of the device. Resulting limitations from miniaturized form factors include limits on battery size and life, power consumption and, thus, processing power, typically two or fewer microphones per side (left and right) and a singular focus on speech recognition and speech enhancement.
Hearing aid technology may use “beamforming” and other methods to allow for directional sound targeting to isolate and amplify just speech, wherever that speech might be located.
Hearing aid technology includes methods and apparatus to isolate and amplify speech and only speech, in a wide variety of environments, focusing on the challenge of “speech in noise” or the “cocktail party” effect (the use of directional sound targeting in combination with noise cancellation has been the primary approach to this problem).
Hearing aid applications typically ignore or minimize any sound in the ambient environment other than speech. Hearing devices may also feature artificial creation of sounds as masking to compensate for tinnitus or other unpleasant remnants of the assistive listening experience for those suffering from hearing loss.
Due to miniature form factors, hearing aids are constrained by a severe restriction on available power to preserve battery life which results in limitations in signal processing power. Applications and devices not constrained by such limitations but rather focused on providing the highest quality listening experience are able to utilize the highest quality of signal processing, which among other things, will maintain a high sampling rate, typically at least twice that of the highest frequency that can be perceived. Music CDs have a 44.1 kHz sampling rate to preserve the ability to process sound with frequencies up to about 20 kHz. Most hearing devices sample at rates significantly below 44.1 kHz, resulting in a much lower range of frequencies that can be analyzed for speech patterns and then amplified, further necessitating the use of compression and other compensating methodologies in an effort to preserve the critical elements of speech recognition and speech triggers that reside in higher frequencies.
Hearing aids have almost always required the need to compensate for loss of hearing at very high frequencies, and given equivalent volume is much higher for very high and very low frequencies (i.e., more amplification is required to achieve a similar volume in higher and lower frequencies as midrange frequencies), one strategy has been compression (wide dynamic range compression or WDRC) whereby either the higher frequency ranges are compressed to fit within a lower frequency band, or less beneficially, higher frequency ranges are literally cut and pasted into a lower band, which requires a learning curve for the user.
For these reasons hearing aid technologies do not adequately function within the higher frequency bands where a great deal of desired ambient sound exists for listeners, and hearing aids and their associated technologies have neither been developed to, nor are capable as developed, to enhance the listening experience for listeners who do not suffer from hearing loss but rather want an optimized listening experience.