A video conference session may involve a first video conference endpoint that transmits video and “mono” sound of participants engaged in a roundtable discussion to a second video conference endpoint. A participant local to the second video conference endpoint (i.e., a remote participant) may have difficulty discerning which participant local to the first videoconference endpoint is talking at any given time due to the use of the mono sound, which does not provide an indication of who is talking. Thus, the remote participant has to rely on visual clues in the transmitted video that might indicate who is talking, but which visual clues may be absent or incomplete. As a result, the remote participant does not feel fully present or immersed in the roundtable discussion.