Video conferencing systems utilize audio and video telecommunications to allow participants in one location to interact with participants in another location. Some video conferencing systems may capture and transmit a view of multiple participants for display at another system. To help viewers at one location track a conversation at another location, a video conferencing system may attempt to determine the person speaking at the other location. However, challenges exist to accurately identifying an active speaker. The technological solutions described herein offer the promise of addressing such challenges.