Electronic devices are used to provide remote conferencing between multiple users. Typically, an audio stream of a user is generated to capture speech by the user, while audio streams of other users are combined to provide sound for listening by the user. For example, the combined stream may be a monophonic stream for a speaker. For stereo speakers, the monophonic stream is reproduced for left and right speakers. Unlike an in-person meeting, there is no spatial sense of the sound field for different participants in the monophonic stream, and thus voice differentiation and intelligibility is reduced.