The present disclosure relates in general to video transmission. Specifically, the present disclosure relates to apparatus and methods for alleviating bandwidth limitations of video transmission and enhancing the quality of videos at a receiver. More specifically, improved video transmission systems and methods are provided for generating high-resolution videos at a receiver based on independently encoded background and background updates.
Real-time video communications systems and the emerging field of telepresence are facing an intrinsic challenge as they seek to simulate the experience of being present in another physical space to remote users. This is because the human eye remains vastly superior over its field of view with its ability to fixate its high-resolution fovea on objects of interest, compared to commercially available single-lens cameras with their current state-of-art resolution. See, http://www.clarkvision.com/imagedetail/eye-resolution.html (estimating the resolution of the human eye to be 576 megapixels over 120 degrees). In addition, telepresence systems are limited in practice by the network bandwidth available to most users. It is not surprising, therefore, that telepresence has seen limited uptake outside of single person-to-person video chat using the narrow field of view cameras found in most tablets, phones, and laptops.
Automated and manual pan-tilt-zoom (PTZ) cameras in commercial telepresence systems has attempted to overcome the limitation of single lens camera resolution by optically and mechanically fixating the field of view on select parts of interest in a scene. This partially alleviates the resolution limitations, but has several drawbacks. For example, only one mechanical fixation is possible at a given time; as a result, multiple remote users with different interests may not be satisfactorily served. In addition, the zoom lens and mechanical pan-tilt mechanism drives up the cost of the camera system and posts new challenges on the reliability of the entire system. That is, an automated PTZ system creates higher demands on the mechanics compared to a manual system which typically sustains fewer move cycles through its lifetime. Compared to a stationary camera, the bandwidth-demand for high-quality video encoding also increases significantly. Similarly, some digital PTZ in existing systems present many drawbacks as discussed above, including for example the inability to be controlled by multiple users on the far end and the higher bitrate requirement for video encoding.
Panoramic and ultra-wide angle video cameras may meet the resolution requirements of telepresence systems to deliver desirable user experience. These cameras have the potential for growth in sensor resolution and pixel rate well beyond current standards. This can for instance be enabled by curved sensor surfaces and monocentric lens designs. See, http://www.jacobsschool.ucsd.edu/news/news_releases/release.sfe?id=1418 (discussing a 120 degrees FOV imager capable of resolutions up to at least 85 megapixels); http://image-sensors-world.blogspot.co.il/2014/04/vlsi-symposia-sony-presents-curved.html (a sensor manufacturer announcing prototypes of curved image sensors). However, such designs will put a great strain on the capacity of current networks and video encoding efficiency and thereby render them impractical for broad real-world deployment. For example, a video camera of 85 megapixels at 30 frames per second would require a compression down to 0.0002 bit/pixel to fit into a 10 Mbit/s link. This is generally out of reach today, considering the current video compression standards like H.264 which operates at 0.05 bit/pixel under good conditions.
Therefore, there is a need for improved methodologies and systems to alleviate bandwidth limitations of video transmission and to generate high-resolution videos based on conventional camera hardware. There is a further need to utilize these improvements to enable modern real-time communication systems and desirable telepresence experiences.