Artificial processing of speech typically uses a digital representation of the data because of its robustness against distortion. Digital processing further allows streaming of data. Streaming enables audio data, such as speech data, to be compressed on the fly so that real time communication is possible, instead of requiring to wait for a file or a portion of it to download before acquiring access to it. For an introduction to speech processing, see, e.g., Speech Coding and Synthesis, edited by W. B. Kleijn and K. K. Paliwal, Elsevier, 1995, especially pp. 1-47, incorporated herein by reference .
Mixing of speech streams is required at a receiver when multiple speech streams must be rendered and played out through a single audio device. Mixing of speech streams is also desired at an intermediate point in the transmission path (e.g., at a server in a client-server architecture) when multiple speech streams are available that are to be combined into a single stream or into a reduced number of streams for retransmission to a particular receiver or to a group of receivers.
Mixing of multiple streams at the receiver requires the decoded streams to be rendered to produce the signals that are to be played out of the loudspeakers. The rendering function for each stream is defined by the application, and can range from simple duplication for monophonic reproduction through a set of two loudspeakers, to a complicated transfer function for providing loudspeaker compensation and for spatial localization of each sound source.