There have been various approaches to solving sound issues to create an immersive remote participant experience such that the audio and video is of sufficient quality to allow for and enable clear communications. While existing solutions tend to focus on audio and video parametric performance parameters, they tend to not address an immersive remote participant experience allowing for a rich immersive experience that enables the remote participant to have a high degree of control over their audio experience in the form of space virtual positioning, sound source selection and rejection, as well as being able to dynamically adjust the microphone field focus to tailor the audio communication to their specific needs and points of interest.
The remote participant typically does not have control over the sound source audio quality and sound source microphone selection during a call other than being able to mute their personal microphone, and/or add (as known in the prior art) simple audio post processing for the creation of various listening effects, such as stereo presentation, which are not truly representative of the sound source audio. Since the audio is controlled at the main location, the remote participant is subject to various non-consistent situations—such as, but not limited to: variable microphone-to-participant relationships resulting in sound quality issues; not being able to direct or select the microphone based on who is speaking at the time; not being able to confine or expand the microphone field of pickup as needed and; not being able to isolate unwanted noise sources and/or other conversations in the space. By the very nature of the source signals in the prior art, the information does not contain positional and/or spatial information allowing for the creation of a 3D sound field with individual Left & Right sound characteristics to give the remote participant a sense of direction and spatial sense of the source space and participants' specific locations.
Traditional methods use multiple microphone placements which may or may not use the strength of the signal at the microphone to select the correct microphone, which at that point becomes the desired source signal that is passed to the conference system and sent to the remote participants. This is problematic because it results in a mono audio or basic audio-only signal that contains no other information that remote participants can use to tailor their experience. Remote participants do not have control of the main location's microphones selection so they are limited to the quality of the source equipment and the layout of the microphones and microphone arrays. There is no control information passed back from the remote participant to the main conference system that would allow for control switching to the desired sound source. If there are multiple remote participants, they all get the exact same experience and are forced to focus and listen to the audio content that is determined by the source system. If there are noise sources or multiple people talking, the remote participants have no control on an individual basis who or what they want to listen to and what sounds they want to defocus.
U.S. Pat. No. 6,961,439 describes a method and apparatus for producing virtual sound sources that are externally perceived and positioned at any orientation in azimuth and elevation from a listener. In this system, a set of speakers is mounted in a location near the temple of a listener's head, such as for example, on an eyeglass frame or inside a helmet, rather than in earphones. A head tracking system determines the location and orientation of the listener's head and provides the measurements to a computer which processes audio signals, from an audio source, in conjunction with a head related transfer function (HRTF) filter to produce spatialized audio. The HRTF filter maintains the virtual location of the audio signals/sound, thus allowing the listener to change locations and head orientation without degradation of the audio signal. The audio system produces virtual sound sources that are externally perceived and positioned at any desired orientation in azimuth and elevation from the listener.
U.S. Pat. No. 5,337,363 describes a method for producing three dimensional sound associated with an object that is moving from a first position to a second position with respect to the listener. The method includes the effects of Doppler shifting, head shadowing, distance-on-frequency components of the sound, as well as the volume of the sound, and the natural sensitivity of the human ear in the 7-8 kHz range. The method provides for a sequence of digital sound samples which are converted into analog waveforms, and for production of audio signals which will provide sound queues to the listener for the location of the sound in three dimensional space.
EPO Patent Application No. EP0479604 A2 discloses an omnipresent sound system for use by a listener in an artificial reality system which operates to couple sound with presented objects such that, as the sound moves with respect to the user, the user will have the perception of the changing sound both in pitch and in volume. The sound system is comprised of a series of piezoelectric elements spaced apart around a user's head. The system is designed to program each element individually so as to create the illusion of omnipresent three-dimensional sound in conjunction with images presented to the listener, which images define an artificial environment.
Patent Application No. WO1992009921 describes a method and apparatus for creating sounds in a virtual world. The system provides signal processing capabilities to convert monaural sounds to fully spacialized sound sources. A user of the system wearing a pair of stereo headphones perceives live, computer generated, or recorded sounds as coming from specific locations in space, just a listener does in the real world.
There is opportunity for improvement in the current approaches to managing the desired source sound field. Since the current art is focused on giving the main location the control of what is heard or not heard, which inherits the limitations of the implementation of the system, the remote user is at the mercy of the main space's participants and system limitations. This is problematic as various noise sources that cannot be filtered out at the source may dominate the audio content, which reduces the intelligibility of the audio signal. The prior art is further limited as there could be multiple people speaking and only one conversation is germane to the conference. And the remote users have no control to adjust and focus their experience to the relevant conversation, leaving the remote users having to decipher or lose the conversation. In the art, spatial and position attributes of the sound source signal are not transmitted to the remote participant who is left with a flat typically mono based signal to listen to. This is limiting as it does not immerse the remote participant in the space to allow for a rich experience and relational positional location with the sound source having a direction and a position. And because the signal is the same for each remote participant, they cannot adjust and tailor their listening experience to focus on their point of interest, which results in less effective remote participant participation.
The present invention is intended to overcome the limitations and drawbacks of the prior art.