There have been different approaches to solve the issues in regards to managing noise sources, and steering and switching microphone pickup devices to enhance a multi-user room's capability for conferencing. Obtaining high quality audio at both ends of a conference call is difficult to manage due to, but not limited to, variable room dimensions, dynamic seating plans, known steady state and unknown dynamic noise sources. Because of the complex needs and requirements, solving the problems has proven difficult and insufficient.
Traditional methods typically approach the issue with distributed microphones to enhance sound pick up as the microphones are generally located close to the participants and the noise sources are usually more distant, but not always. This allows for good sound pick up; however each participant needs a microphone for best results, which increases the complexity of the hardware and installation. Usually the system employs microphone switching and post-processing, which can degrade the audio signal through the addition of unwanted artifacts, resulting from the process of switching between microphones. Adapting to participants standing at white boards, projection screens and other non-seated locations is usually not handled acceptably. Dynamic locations could be handled through wireless apparel or situational microphones and although the audio can be improved, such microphones do not incorporate positional information only audio information.
Another method to manage dynamic seating and participant positions is with microphone beam arrays. The array is typically located on a wall or ceiling environment. The arrays can be steered to help direct the microphones on desired sounds so the sound sources can be tracked and theoretically optimized for dynamic participant locations.
In the current art, microphone beam forming arrays are arranged in specific geometries in order to create microphone beams that can be steered towards the desired sound. The advantage of the beam method is that there is a gain in sound quality with a relatively simple control mechanism. Beams can only be steered in one dimension (in the case of a line array) or in two dimensions (in the case of a 2-D array). The disadvantage of beam formers is that they cannot locate a sound precisely in a room, only its direction and magnitude. This means that the array can locate the general direction as per a compass-like functionality, giving a direction vector based on a known position, which is a relative position in the room. This method is prone to receiving equally, direct signals and potential multi-path (reverberation), resulting in false positives which can potentially steer the array in the wrong direction.
Another drawback is that the direction is a general measurement and the array cannot distinguish between desirable and undesirable sound sources in the same direction, resulting in all signals picked-up having equal noise rejection and gain applied. If multiple participants are talking, it becomes difficult to steer the array to an optimal location, especially if the participants are on opposite sides of the room. The in-room noise and desired sound source levels will be different between pickup beams requiring post-processing which can add artifacts and processing distortion as the post processor normalizes the different beams to try and account for variances and to minimize differences to the audio stream. Since the number of microphones that are used tends to be limited due to costs and installation complexity, this creates issues with fewer microphones available to do sound pick-up and location determination. Another constraint with the current art is that microphone arrays do not provide even coverage of the room, as all of the microphones are located in close proximity to each other because of design considerations of typical beam forming microphone arrays. The Installation of 1000s of physical microphones is not typically feasible in a commercial environment due to building, shared space, hardware and processing constraints where traditional microphones are utilized, through normal methods established in the current art.
An approach in the prior art is to use frequency domain delay estimation techniques for maximum sound source location targeting. However, frequency domain systems in this field require substantial memory resources and computational power, leading to slower and less-exact solutions.
U.S. Pat. No. 6,912,178 discloses a system and method for computing a location of an acoustic source. The method includes steps of processing a plurality of microphone signals in frequency space to search a plurality of candidate acoustic source locations for a maximum normalized signal energy.
U.S. Pat. No. 4,536,887 describes microphone array apparatus and a method for extracting desired signals therefrom in which an acoustic signal is received by a plurality of microphone elements. The element outputs are delayed by delay means and weighted and summed up by weighted summation means to obtain a noise-reduced output. A “fictitious” desired signal is electrically generated and the weighting values of the weighted summation means are determined based on the fictitious desired signal and the outputs of the microphone elements when receiving only noise but no input signal. In this way, the adjustments are made without operator intervention. The requirement of an environment having substantially only noise sources, however, does not realistically reflect actual sound pickup situations where noise, reverberation and sound conditions change over relatively short time periods and the occurrence of desired sounds is unpredictable. It is an object of the '887 patent to provide improved directional sound pickup that is adaptable to varying environmental conditions without operator intervention or a requirement of signal-free conditions for adaptation.
The article, “A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays”, Joseph Hector DiBiase, May 2000, discloses attempts to show that pairwise localization techniques yield inadequate performance in some realistic small-room environments. Unique array data sets were collected using specially designed microphone array-systems. Through the use of this data, various localization methods were analyzed and compared. These methods are based on both the generalized cross-correlation (GCC) and the steered response power (SRP). The GCC techniques studied include the phase transform, which has been dubbed “GCC-PHAT”. The beam-steering methods are based on the conventional steered response power (SRP) and a new filter-and-sum technique dubbed “SRP-PHAT”.
U.S. Pat. No. 6,593,956 B1 describes a system, such as a video conferencing system, which includes an image pickup device, an audio pickup device, and an audio source locator. The image pickup device generates image signals representative of an image, while the audio pickup device generates audio signals representative of sound from an audio source, such as speaking person. The audio source locator processes the image signals and audio signals to determine a direction of the audio source relative to a reference point. The system can further determine a location of the audio source relative to the reference point. The reference point can be a camera. The system can use the direction or location information to frame a proper camera shot which would include the audio source
EU. Patent No EP0903055 B1 describes an acoustic signal processing method and system using a pair of spatially separated microphones (10, 11) to obtain the direction (80) or location of speech or other acoustic signals from a common sound source (2). The description includes a method and apparatus for processing the acoustic signals by determining whether signals acquired during a particular time frame represent the onset (45) or beginning of a sequence of acoustic signals from the sound source, identifying acoustic received signals representative of the sequence of signals, and determining the direction (80) of the source, based upon the acoustic received signals. The '055 patent has applications to videoconferencing where it may be desirable to automatically adjust a video camera, such as by aiming the camera in the direction of a person who has begun to speak.
U.S. Pat. No. 7,254,241 describes a system and process for finding the location of a sound source using direct approaches having weighting factors that mitigate the effect of both correlated and reverberation noise. When more than two microphones are used, the traditional time-delay-of-arrival (TDOA) based sound source localization (SSL) approach involves two steps. The first step computes TDOA for each microphone pair, and the second step combines these estimates. This two-step process discards relevant information in the first step, thus degrading the SSL accuracy and robustness. In the '241 patent, direct, one-step, approaches are employed. Namely, a one-step TDOA SSL approach and a steered beam (SB) SSL approach are employed. Each of these approaches provides an accuracy and robustness not available with the traditional two-step approaches.
U.S. Pat. No. 5,469,732 B1 describes an apparatus and method in a video conference system that provides accurate determination of the position of a speaking participant by measuring the difference in arrival times of a sound originating from the speaking participant, using as few as four microphones in a 3-dimensional configuration. In one embodiment, a set of simultaneous equations relating the position of the sound source and each microphone and relating to the distance of each microphone to each other are solved off-line and programmed into a host computer. In one embodiment, the set of simultaneous equations provide multiple solutions and the median of such solutions is picked as the final position. In another embodiment, an average of the multiple solutions is provided as the final position.
The present invention is intended to overcome one or more of the problems discussed above.