Video conferencing systems (e.g., multiway video conferencing) may use scaling, layering or multicasting of real time video, such that the focal point of the video conference for a given user (e.g., the user that is speaking) may be on a larger image, while users that may not be speaking may be displayed in a smaller image. The video of the speaker in the larger image may be shown with higher quality video feed, whereas the non-speakers in the smaller images may be shown with lower quality video feed (e.g., to save on network and system resources). When a non-speaking user becomes a new speaking user and transitions from being shown in the smaller image (with the lower quality video feed) to the larger image (with the higher quality video feed), there may be a time delay (e.g., 3-4 seconds) between when the new speaking user is shown in the larger image and when the lower quality video feed finally switches to the higher quality video feed.