Field of the Invention
The present invention generally relates to delivering a high quality bidirectional audio experience in a multi-user room and more specifically to systems and methods for real-time scalable impulse response generation, sound masking and measurement to implement dynamic microphone array adaption and position determination while embedding a flexible data communication channel.
Description of Related Art
Establishing high quality audio and video bidirectional performance has always been a challenge for business applications. Supporting a plurality of users in a variety of situations and seating positions has proven to be a difficult problem to solve. In addition to the performance requirements, the system needs to deal with environmental, architectural and building issues, such as, but not limited to, noise from heating, ventilation, air conditioning, external noise, and irregular shape and various sizes of multi-user rooms. The current art solves the problems through the use of custom solutions and complex system integration, which requires the use of professional audio and video engineers, architectural, information technology, and other professional support services making for costly uniquely designed solutions that do not typically scale or adapt well without introducing a redesign phase.
Currently multi-user rooms utilize many forms of audio/video conference systems to help obtain the best audio performance using a microphone system for sound pick up and speakers for sound distribution to provide the required bi-directional audio quality. Current implementations integrate individual solutions that usually are not tightly integrated to obtain the benefit of sound masks, echo cancellation, and microphone arrays, which usually means no benefit of a combined signal to accomplish a holistic system approach that is adaptable in real-time to changing systems parameters, such as adding a microphone by determining its position and extending the array, and dynamic echo cancellation, among other benefits.
By the very nature of the complex requirements a system that meet all of the needs and expectations of the users is usually designed for a specific room and application. This can be a complex and costly undertaking resulting in installed solutions that usually are not adaptable easily for new rooms and or environments, requiring design changes and calibration tuning. Typically, these types of changes require the room to be put into maintenance mode to adjust for changes to the microphone array and speaker setup and locations. When changes occur that impact room properties, such as the sound propagation delay time between microphones and speakers, previous calibrations are effectively invalidated—for example, if a room becomes more reflective and or damped.
In the prior art, systems deploy microphones that are located in close proximity to participants' typical locations and/or they employ static microphone arrays. Both such systems are designed for audio sound pick up with the least noise in the form of signal to noise ratio and best voice quality-thus giving an acceptable conference experience. But both such systems bring their own unique set of problems. Using closely located microphones creates clutter and necessitates complex installations, creating the need to run extra cabling and hardware as the persons may not be seated or standing in a place that is optimal for microphone placement and hookup. A static microphone array cannot be adjusted for extra microphones and is preconfigured with design assumptions that may not be valid or may change through usage. This can limit the array's effectiveness, requiring additional microphones to be added to the system that are parallel to the array but not a part of the array—so the beam focusing, sound and noise management properties are greatly diminished. Complex static microphone arrays need to be designed and tuned to a particular application, so they are not suitable for scaling the array. To install a microphone array in a space, the array dimensions and parameters need to be determined, designed and installed to exacting specifications.
Current implementations of in-room audio system usually deploy a specific sound mask for noise control by raising the noise floor in a benign manner so that unwanted noises are masked. Sound masks by their very nature are typically random pink noise filtered and shaped audio signals that are designed and tailored to a specific room and environmental needs, such as, but not limited to, heating, ventilation, air conditioning, privacy, in-room hardware and ambient noise considerations. They need to be non-obtrusive and they need to be perceived as non-correlated audio signals by the ear so they do not draw attention to the sound masks themselves. But this very property makes them unsuitable for relocating microphones and speakers due to the random non-correlated signal properties. Sound masks are usually engineered and installed to specific specifications and design criteria that take into account room idiosyncrasies, environmental noise conditions, and the business needs of the room. The room may have confidentiality concerns, multiple users, and uses with video and audio conference capabilities requiring a properly setup sound masking solution. The typical prior art solutions are a single purpose signal and as such are limited to a single application purpose.
In the prior art, impulse responses are used in establishing room properties, microphone and speaker relationships, and placements in relative and absolute positions. With the relationships known, echo cancellation can be achieved by subtracting the un-desired signal from the speakers when picked up by the microphones, to remove feedback into the system which could cause large oscillations and distortions that can stress a system. The problem with signals used to obtain impulse responses, such as, but not limited to, claps and chirps, is that they are not easy to listen to and they can be correlated by the ear to form patterns. As a consequence the room setup and calibration needs to be performed when the room is offline and out of commission. If anything in the setup changes, such as, but not limited to, changes in systems, room structural dimensions, furniture and content changes, as well as acoustic properties whether they are reflective or absorptive in nature, the calibrations and setup need to be redone. This characteristic makes these signals ill-suited to live, in-person meeting, conference, and presentation situations, in a room with an auto-calibration capable functionality to adapt to changing room conditions and additional hardware, such as, but not limited to, microphones.
U.S. Pat. No. 4,914,706A describes a random noise generator with multiple outputs that can be tailored through custom low pass filters.
U.S. Pat. No. 8,223,985B2 describes a method for masking pure tones within a sound mask. Pure tones are not suitable as an impulse signal because when there are correlations, the result is sine waves and not an impulse signal.
U.S. Patent Application Publication No. 2003/0103632A1 describes a method to sample undesired sound and generate white noise tailored to mask the undesired sound.
U.S. Pat. No. 7,526,078B2 describes a method for combining a modulated subcarrier onto an audio signal of a conference.
U.S. Pat. No. 8,804,984B2 describes spectrally shaping audio signal(s) for audio mixing.
U.S. Pat. No. 8,666,086B2 describes a technique for monitoring and/or controlling a sound masking system from a computer aided design drawing.
U.S. Patent Application Publication No. 2008/0147394A1 describes a speech processing system for improving a user's experience with a speech-enabled system using artificially generated white noise.
U.S. Patent Application Publication No. 2003/0107478A1 describes an architectural sound enhancement system for installation in a space having a suspended ceiling to provide integrated masking, background, and paging functions.
U.S. Pat. No. 8,477,958B2 describes a masking system for shaping the ambient noise level in a physical environment.
U.S. Pat. No. 5,781,640A describes a system for suppressing the effects of undesirable noise from an annoying noise source that contains a plurality of transformation sounds which, when combined with the noise, form a sound selection process.
U.S. Pat. No. 6,996,521B2 describes a method for embedding a data signal in an audio signal and determining the data embedded signal.
U.S. Patent Application Publication No. 2006/0109983A1 describes a method and corresponding apparatus of adaptively masking signals in an efficient effective manner, including providing a signal; generating a masking signal that adaptively corresponds to the signal; and inserting the masking signal into a channel corresponding to the signal at a location proximate to the source of the signal to facilitate masking the signal in the channel.
U.S. Patent Application Publication No. 2004/0068399A1 describes a technique for communicating an audio stream. A perceptual mask is estimated for an audio stream, based on the perceptual threshold of the human auditory system. A hidden sub-signal, or to concurrent services that can be accessed while the audio stream is being transmitted.
U.S. Pat. No. 6,208,735B1 describes digital watermarking of audio, image, video or multimedia data by inserting the watermark into perceptually significant components of the frequency spectral image.
U.S. Pat. No. 6,650,762B2 describes a new approach to data embedding within ITU G.722 and ITU G.711 based upon the method of types and universal classification.
U.S. Pat. No. 6,584,138B1 describes a coding method and a coder for introducing a non-audible data into an audio signal, which is first transformed to a spectral range and the signal is determined.
Chinese Patent No. CN102237093B describes an echo hiding method based on forward and backward echo kernels.
Chinese Patent Application Publication No. CN102148034A describes an echo hiding based watermark embedding and extracting method belonging to the technical field of information safety.
U.S. Patent Application Publication No. 2003/0002687A1 describes an apparatus and related method for acoustically improving an environment.
U.S. Pat. No. 8,212,854B2 describes a method and system with means for preventing unauthorized monitoring of a local conference room in which a local conferencing system is located comprising generation of a deterministic sound signal on a first loudspeaker connected to, or integrated in the conferencing system, detecting the deterministic signal picked up by a microphone connected to, or integrated in the conferencing system, and transferring the conference system into a security mode, if the deterministic.
Chinese Patent No. CN101354885B describes an active control of an unwanted noise signal that has an amplitude and/or frequency such that it is masked for a human listener at the listening site by the unwanted noise signal present at the listening site in order to adapt for the time-varying secondary path in a real time manner such that a user doesn't feel disturbed by an additional artificial noise source.
Japanese Patent Application Publication No. JP2008233672A describes a technique for generating a masking sound having sound characteristics most suitable for masking sound characteristic of a sound to be masked.
U.S. Pat. No. 6,674,876B1 describes methods and systems for time-frequency domain watermarking of media signals, such as audio and video signals.
U.S. Pat. No. 6,061,793A describes a technique for hiding of data, including watermarks, in human-perceptible sounds, that is, audio host data.
U.S. Patent Application Publication No. 2008/0215333A1 describes a method of embedding data into an audio signal, providing a data sequence for embedding in the audio signal and computing masking thresholds for the audio signal from a frequency domain transform of the audio signal.
European Patent Application Publication No. EP1722545A1 describes a method for reducing the total acoustic echo cancellation convergence time for all look directions in a microphone array based full-duplex system.
Chinese Utility Model No. CN201185081Y describes an echo eliminator that can eliminate different echoes, which comprises a parameter adjustable subtracter that can adjust the subtract time parameter according to the time difference of the echoes so as to eliminate the corresponding echoes corresponding to the inputted mixed audio, and a non-linear processing circuit that is connected with the parameter adjustable subtracter and is used for performing the non-linear processing of the audio signal with the echoes being eliminated that is outputted by the subtracter so as to reduce the non-linear distortion factor of the audio signal, so that different echoes that are produced at different video conference fields can be effectively removed, thereby effectively improving the quality of the audio signal.
U.S. Pat. No. 6,937,980B2 describes audio processing providing enhanced speech recognition. Audio input is received at a plurality of microphones. The multi-channel audio signal from the microphones may be processed by a beamforming network to generate a single-channel enhanced audio signal, on which voice activity is detected. Audio signals from the microphones are additionally processed by an adaptable noise cancellation filter having variable filter coefficients to generate a noise-suppressed audio signal.
U.S. Pat. No. 6,748,086B1 describes a cabin communication system for improving clarity of a microphone array including a first voice primarily in a first direction and for converting the spoken microphone, positioned at a second location within the cabin, for receiving the spoken voice into a second audio signal.
U.S. Pat. No. 9,171,551B2 describes a unified microphone pre-processing system that includes a plurality of microphones arranged within a vehicle passenger compartment, a processing circuit or system configured to receive signals from one or more of the plurality of microphones, and the processing circuit configured to enhance the received signals for use by at least two of a telephony processing application, an automatic speech recognition processing application, and a noise cancellation processing application.
U.S. Pat. No. 5,453,943A describes an “adaptive synchrophaser” for modifying the phase angle relationship between aircraft propellers to reduce cabin noise and/or vibration.
U.S. Pat. No. 6,760,449B1 describes a microphone array system that includes a plurality of microphones and a sound signal processing part. The microphones are arranged in such a manner that at least three microphones are arranged in a first direction to form a microphone row, at least three rows of the microphones are arranged so that the microphone rows are not crossed each other so as to form a plane, and at least three layers of the planes are arranged three-dimensionally so that the planes are not crossed each other, so that the boundary conditions for the sound estimation at each plane of the planes constituting the three dimension can be obtained.