The use of immersive augmented reality, display wall, head mounted display, and video conference has increased in recent years. For example, a video conference is an online meeting that takes place between two or more parties, where each party can hear the voice and see the images of the other. In a video conference between two parties, each party participates through a terminal, e.g., a desktop computer system, a tablet computer system, TV screen, display wall, or a smart phone, at each site. A terminal typically comprises a microphone to capture audio, a webcam to capture images, a set of hardware and/or software to process captured audio and video signals, a network connection to transmit data between the parties, a speaker to play the voice, and a display to display the images. In such a traditional setup, a viewer could only see a fixed perspective of his counterparty and her scene. In particular, the viewer could only see what is captured by the counterparty's webcam. Further, as the viewer moves from one location to another during the conference, his point of view (POV) may change. However, due to limitations of the image capturing at the counterparty's site, the viewer could only see images from the same perspective all the time.