Video teleconferencing systems are becoming ubiquitous for both business and personal applications. And most such prior art video teleconferencing systems make use of at least two audio speakers (e.g., either loudspeakers or headphone speakers) to provide the audio (i.e., the sound) which is to be played concurrently with the associated displayed video. Moreover, there are many video teleconferencing systems which make use of a window-based display, including both “personal” (e.g., PC-based) teleconferencing systems, as well as more sophisticated commercial teleconferencing systems for business (e.g., corporate) use.
However, such prior art systems rarely succeed in (assuming that they even try) matching accurately the auditory space with the corresponding visual space. That is, in general, a prior art video teleconferencing system participant viewer who is watching a window-based video display while listening to the corresponding audio will often not hear the sound as if it were accurately emanating from the proper physical (e.g., directional) location (e.g., an apparent physical location of a human speaker visible in a given video window on the display). Even when a stereo (i.e., two or more channel) audio signal is provided, it will typically not match the appropriate corresponding visual angle, unless it happens to do so by chance. Therefore, a method and apparatus for accurately matching auditory space to visual space in video teleconferencing applications using window-based displays would be highly desirable.