A video conference system includes an endpoint that uses cameras to capture video of participants in a room and then transmits the video to a conference server or to another endpoint. Different cameras may be set-up to capture video of participants positioned in different areas of the room. Typically, an operator has to manually select which of the cameras is to capture video of talking participants (who change over time) in respective ones of the different areas. Such manual selection of different cameras to capture video of different talking participants is cumbersome and inconvenient.