Multi-party voice communication systems, such as digital or analogue voice conference or video conference systems, mix (e.g., combine, in particular by additive mixing) live signals originating from different system endpoints to approximate the sound that would have been heard if all the communicating parties had been present in one location. It is a common experience—and partly due to the fact that the parties are able to interact via sound or limited view angles only—that voices are harder to separate and more difficult to understand than in a real-life conversation. In particular, talker collisions may be more frequent.
US 2008/144794 is directed to the problem of separating speakers in an online conference. According to that application, the problem can be alleviated by conceptually locating the speakers in a virtual environment and simulating their distances, azimuth angle and elevation angle with respect to the listener by adding spatial cues to the voice signals in accordance with their points of origin in the virtual environment. The spatial cues discussed in US 2008/144794 include total intensity, inter-ear intensity ratio, ratio of direct and reflected sound, head-shadow azimuthal effects, pinna-induced frequency filtering and similar monaural and binaural effects. It is well known that the human sense of hearing resolves speech collisions easier if the speakers are (seemingly) separated in space.
It would be desirable to develop further techniques enhancing the intelligibility of speech in a mixed voice signal.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested. Unless otherwise indicated, like reference numerals refer to like parts in different figures.