The current generation of real-time augmented reality (AR) based communication systems and methods are severely defective due to poor image quality or slow processing speed. For example, Apple Photobooth™ allows users to create photos or videos of themselves in a virtual environment. Google Hangout™, an audio and video conference platform, allows users to select background during a video conference session and allows users to wear exchangeable virtual items such as hats, glasses and mustache. However, such existing systems are crude and primitive from a visual perspective. In particularly, the holographic quality of human objects is very poor because the existing methods for extracting physical objects are insufficient to capture the more intricate characteristics and features of humans. For similar reasons, such methods also fail to integrate extracted human objects with a virtual environment. Often there are obvious and sometimes significant gaps and numerous imperfections at the edges around an extracted human object. Such defects are more pronounced where a virtual environment includes moving elements or when users are moving.
On the other hand, human observers are much more sensitive to extraction errors or inaccuracies of human bodies (in particular the faces) than other objects/scenes, especially when the images are of the observers themselves. Furthermore, the existing systems and methods do not allow user to naturally interact with the virtual world. Such defects severely comprise user experience in real-time AR based communications. Ways for overcoming these defects are needed.