Current technologies for combining videos usually suffer from various problems or disadvantages. One such technology is known as “Virtual Studio Technology”. In a “Virtual Studio Technology” a real-world television studio allows combining a first video of real-world environments and/or real-world objects, with a second, computer generated, video of a virtual scene including virtual objects, in a seamless manner and optionally in real-time, thereby generating a third, combined, video. The first video is captured by real-world video cameras, whereas the second video can be regarded as being captured (or rendered) by “virtual cameras”. A virtual camera can be defined by a virtual position (e.g. with respect to a certain object or point in the virtual scene), a virtual orientation (e.g. with respect to a certain object or point/s in the virtual scene) and a virtual viewing angle (the angle of view that the virtual camera encompasses/sees). The virtual position, and virtual orientation define a Point-of-View (POV) the virtual camera has of the scene and the virtual viewing angle defines the frames (being for example the still images which compose the video) that the virtual camera sees or acquires.
It is to be noted however that in Virtual Studio Technology, the real-world camera can move in a fixed space (determined for example by the real-world studio's height, width and length, and/or by the movements the real-world camera can perform), while the scene captured by the virtual camera is being rendered in real-time from the same POV of the real-world camera. Therefore, the virtual camera has to adopt, at any given time, the real-world camera's settings (zoom, pan, angle, traveling, etc.). In other words, in Virtual Studio Technology, the real-world cameras in the real-world studio are the “master” cameras, and the virtual cameras, rendering the virtual scene, are “slave” cameras that are required to follow the real-world camera movements. Therefore, any content generated by the Virtual Studio Technology (i.e. the video generated by compositing (1) the video captured by the real-world camera in the real-world studio and (2) the Three-Dimensional (3D) Computer Generated (CG) video virtually rendered by the virtual camera) will always be physically constrained to what the real-world cameras can physically do in the limited and confined real-world studio environment. For example, if the real-world studio would be only ten square meters wide—both real-world camera movement and the virtual camera that “follows” it would be constrained to those limited scales.
In addition, recording sessions with Virtual Studio Technology usually require relatively large real-world studio, filled with lights, large areas of perfectly-even-lit green-screen material, and expensive equipment enabling dramatic camera movements across large distances and spaces. Such cinematic devices include electrical dollies, cranes, dolly & crane combinations, jib-camera-arms, camera-booms, etc. Additionally it requires tracking each real-world camera in the real-world studio, and mapping its position and orientation (and optionally also its variable focal length lens position (zoom)) onto the corresponding virtual camera, so that the point of view of the virtual cameras viewing the virtual CG scene and/or objects inside the virtual scene will be synchronized with the point of view of the real-world cameras capturing the real-world scene and/or real-world objects inside the real-world scene. Once synchronisation between the points of view of each real-world camera and its corresponding virtual camera is established, visual mixing of the result can be achieved, e.g. utilizing known green-screen chroma-keying techniques.
However, due to the required cinematic camera-moving devices, motorized camera controls, and expensive camera-tracking systems, conventional Virtual Studio Technologies are usually expensive and involve very intense preparations, calibrations (for achieving identical spatio-optical renderings of space as seen by all real-world cameras in the real-world studio and the corresponding virtual cameras rendering the virtual scene), and set-up times before working configuration is ready for an actual shooting/recording session.
Furthermore, in order to be able to successfully perform electronic/video replacement of chroma-key green surface with CG visuals—a special kind of lighting may be required. To ensure an equal level of green color across the entire green surface placed in the background of the real-world studio the lighting conditions needs to assure that there are no segments of the green surface too dark, and no segments too bright. Large flood-lights carrying special light-diffusers are therefore positioned and distributed evenly across the ceiling of the studio to provide near-perfect light conditions for green color to remain even across the chroma-key green surface that usually covers large portions of the real-world studio.
As a resulting sum of all these complex processes to assure full functionality of Virtual Studio Technology—use of conventional real-time Virtual Studio Technology is currently relatively limited in scope and proven to be expensive. Due to the prices of studio-recordings per day, and due to the fact that set-up process is relatively tedious and both technologically and logistically complex, real-time Virtual Studio Technology found only limited use in commercial television applications, like niche uses for election days, sports events, and other television events with relatively high television commercial value—events and broadcasts that will justify the use of expensive real-time Virtual Studio Technology equipment and process.
With exception of rare and expensive television primetime uses, due to the prohibitive costs of large studio, large chromakey green/blue backdrops, calibration process, setup process, large lighting requirements, sophisticated devices for camera transport, complex system for camera tracking, and highly-skilled personnel—in both production and creative terms—real-time Virtual Studio Technologies failed to this point to become mainstream production solution for any kind of video content.
US Patent Application No. 2012/0314077 (Clavenna I I et al.) published on Dec. 13, 2012 discloses a network device that receives, from a first video camera system, position information for a first video camera at a first site and sends, to a second video camera system, position instructions for a second video camera at a second site. The position instructions are configured to locate the second video camera within the second site to correspond to a relative position of the first camera in the first site. The network device receives, from the first video camera system, a first video feed including images of the first site and receives, from the second video camera system, a second video feed including images of a subject of the second site. The network device combines the first video feed and the second video feed to generate a synchronized combined video feed that overlays the images of the subject of the second video feed in images of the first site.
US Patent Application No. 2011/0128300 (Gay et al.), published on Jun. 2, 2011, discloses a system and method for integrating a virtual rendering system and a video capture system using flexible camera control to provide an augmented reality. There is provided a method comprising receiving input data from a plurality of clients for modifying a virtual environment presented using the virtual rendering system, obtaining, from the virtual rendering system, a virtual camera configuration of a virtual camera in the virtual environment, programming the video capture system using the virtual camera configuration to correspondingly control a robotic camera in a real environment, capturing a video capture feed using the robotic camera, obtaining a virtually rendered feed using the virtual camera showing the modifying of the virtual environment, rendering the composite render by processing the feeds, and outputting the composite render to the display.
US Patent Application No. 2015/0304531 (Rodrigues Garcia et al.) published on Oct. 22, 2015 discloses a method, the method comprises capturing by at least a camera an image of a physical object against a background; extracting a silhouette of said physical object from the captured image and mapping it over a three-dimensional geometry; incorporating said virtual object as one more element in the virtual scene; and orienting said virtual object with regard to the virtual camera. Embodiments of the method further comprises obtaining and using intrinsic and/or extrinsic parameters of said physical camera and said captured image to calculate said physical object position; projecting back said captured image over the three dimensional geometry using said intrinsic and/or extrinsic parameters; and placing the virtual object in the virtual scene and selecting an axis of rotation to orient the virtual object with regard to the virtual camera based on said calculated position of the physical object.
US Patent Application No. 2015/195509 (Herman) published on Jul. 9, 2015 discloses a method for incorporating two dimensional images such as those captured by a video camera, which is moving and whose optics, particularly zoom and focus, are controlled by a human or by automatic means, into a virtual three dimensional coordinate system is provided In one embodiment the method acquires calibration data over the functional range of the studio camera optics, and then in operation dynamically performs the appropriate transformations needed to map the video stream to the virtual coordinate system, even as the acquiring studio camera moves, zooms, and changes focus.
US Patent Application No. 2014/267777 (Francois et al.) published on Sep. 18, 2014 discloses a method for shooting a performance making use of unmanned aerial vehicles, such drones for example, to provide the physical markers that are needed to give a physical actor indications on the positioning of virtual elements to be inserted later in the scene, and with which s/he needs to interact.
China Patent Application No. CN 104618642 (Nining) published on May 13, 2015 discloses a terminal photographing control method. The terminal comprises at least one main camera for photographing. The method comprises the steps of presetting a movable virtual camera; building a mapping relation between the main camera and the virtual camera; calculating and processing an image acquired by the main camera by the terminal according to the mapping relation during photographing through the main camera. The invention further provides a photographing terminal for implementing the method above. According to the method and the photographing terminal, the touch function of the photographing terminal can be utilized to adjust the photographing position, and thus the user experience can be increased.
Japan Patent Application No. JP 2003219271 (Yamauchi Yuiko et al.) published on Jul. 31, 2003 discloses: PROBLEM TO BE SOLVED: To provide a multipoint virtual studio synthesizing system for synthesizing videos from a plurality of cameras in real time without causing an uncomfortable feeling based on videos and depth information obtained by using the plurality of cameras; SOLUTION: This multipoint virtual studio synthesizing system is provided with robot cameras arranged at a plurality of places and for acquiring image data of objects and depth information of the image data, means and for controlling camera parameters including robot camera panning, tilting, zooming and focus in real time, means for generating a key signal of each of the objects for synthesizing images on the basis of the depth information obtained in each of the robot cameras, an electronic plotting means for generating electronic videos and depth information of the electronic videos to be used in image synthesis, and a means for extracting image parts to be synthesized on the basis of the key signals of the objects and synthesizing the images.