Field of the Invention
The present invention relates to a method for processing of digital motion images, and to relates computer program products and devices.
Description of the Related Art
The production of high quality entertainment material is now a complex process. Up until the 1990's movie making was essentially all done ‘in camera’. By this we mean that everything was arranged (actors, set, props, lighting) as it was to be in the final production. With the advent of computer or digital processing of images, a major change happened. Digital processing was introduced for two main reasons. Firstly it could be used to create final images that would be very difficult to physically assemble and film. One example could be the portrayal of 20 feet high monsters. Whilst of course it is possible to build physical model monsters, it is both inconvenient, time consuming and expensive to do so. The second such use of digital effects was to correct imperfections in shooting, or undesired aspects in the shoot. Such a correction may be to alter a character's eye coloring from its natural brown to a chosen shade of blue.
There are currently many commercial products available to digitally alter material that has been shot. Altering the grade (the color balance of material) is often carried out using the Baselight System, from Filmlight Ltd, London UK, or other similar systems. Compositing effects are often achieved using the ‘Nuke’ system, from The Foundry Visionmongers Ltd, London, UK.
The current application concerns effects applied to individual frames of a moving image, of which rotoscoping is the primary example. Rotoscoping (often abbreviated as “roto”) is where an operator effectively ‘draws’ on a frame of the video or movie. It is often used as a tool for visual effects in live-action movies. By tracing an object, a silhouette (called a matte) is created. The matte can be used to extract the object from a scene for use on a different background or to apply visual effects to the object. Rotoscoping can be used to allow a special visual effect (such as a glow, for example) to be guided by the matte or rotoscoped line. One classic use of traditional rotoscoping was in the original three Star Wars films, where it was used to create the glowing light sabre effect, by creating a matte based on sticks held by the actors. To achieve this, editors trace a line over each frame with the prop, then enlarged each line and added the glow.
Rotoscoping used to be carried out manually with physical drawing, but these days it can be carried out electronically on the digital files. Rotoscoping in the digital domain is often aided by motion tracking software. This assists an operator, by calculating where an object will be in subsequent frames, based on its previous movement characteristics. This means that it is not necessary to manually locate an object of interest in all the frames of a moving image. While blue and green screen techniques have made the process of layering subjects in scenes easier, rotoscoping still plays a large role in the production of visual effects imagery.
One product available that combines motion tracking and rotoscoping is the Mocha® product, produced by Imagineer Systems Ltd, of Guildford, Surrey, UK. This product makes use of planar tracking to track an object in a sequence of frames for a motion image. Planar tracking is described in the book ‘Compositing Visual Effects’ by Steve Wright, published by Focal Press in 2008, pages 153-157. Planar Trackers are a very useful tool to assist in rotoscoping. They involve the assumption that objects to be tracked are ‘planar’ (i.e. moving in a 2D plane). Page 156 of the above book teaches how this is done. Using a planar tracker, the relationship over time between a given object can be defined by a homography, which is a matrix transformation relating the assumed planar form of the object in a first frame of the image at a first time, with the assumed planar form of the object in a second frame at a second time.
A popular, and growing, trend in movies and other similar image products is the so-called ‘3D’ Movie (and 3D TV), which uses a stereo image system. Stereo image systems generally work by having two image streams, one for each eye, and displaying them alternately with spectacles that allow only one eye at a time to be view each image stream. This creates the illusion of depth by fooling the eye and creating the illusion of stereopsis (binocular vision). One of the most famous movies using this type of stereoscopic image is ‘Avatar’, directed by James Cameron. A problem arises with processing technique such as rotoscoping since in stereo movies there are two image streams, and hence there are twice as many source frames, which could be twice as much material to process. This would add substantially to the cost of ‘effects heavy’ movies, which are currently the most popular for 3D movies.
A simple solution to this is to apply an offset to the object in one image stream compared to the other. One of the two image streams is designated as the ‘hero’ eye image stream, all processing is applied to the hero eye, and this is mapped to the other eye. However, this does not work well in practice since a simple offset does not allow for change in perspective or depth. By way of example, consider a video image where it is desired to alter the color of an actor's lips. In stereo there are two image streams. If the operator accurately draws around the lips of the actor in a given frame, in (say) the right eye view, it may be expected that the lip shape will be the same in the left eye view. However, this is not the case since the two cameras that are responsible for the two image streams will have a different perspective. There will be an X-axis (horizontal) displacement, which may possibly be corrected by the ‘drag and drop’ of the shape. However, there will also be a perspective change in the lip shape between cameras and a final contribution to the different shapes is depth. The lips are almost certain to be at different depths to the two cameras. Thus, although a known planar tracking system could be used for one eye's view, the resultant tracking would not provide the required information about the other eye's view even if an offset is applied. The whole tracking process may need to be repeated for the second eye, which would involve undesirably increasing the computing resources and man power required.
A prior art technique for automating the processing of stereo images uses algorithms to determine what is known as a ‘disparity map’. The term disparity map refers to the apparent pixel difference or motion between a pair of stereo images. Papers published since at least 1977 have taught how to produce disparity maps. One early publication is ‘A Theory of Human Stereo Vision” by D, Marr and T Poggio, November 1977, published by the Massachusetts Institute of Technology Artificial Intelligence Laboratory, as Al memo 451.
With reference to the example of changing the color of an actor's lips, it is possible to build a disparity map using one of the known published techniques, to then use a conventional tracking process to identify points representing the lips (or other object of interest) in images for a first eye and then, via the disparity map, determine the disparity between the eyes on a point by point basis.
There are several problems with the use of disparity maps. Firstly, there are many algorithms for determining a disparity map. The creator of each algorithm typically claims that his algorithms are better than previous algorithms. Secondly, some proposed methods are re-iterative. This means that the process has to be run many, many times, where slight improvements to the answer are made in each iteration. This obviously results in a high burden in terms of computer processing and memory. Thirdly, many disparity maps are non-deterministic and hence for a given physical layout of object and cameras there are many different answer that ‘fit’ the algorithm.
Most significantly, the use of a disparity map does not reduce the processing burden when dealing with stereo images. The resolution of the disparity map is the same as the resolution of the image. Hence, although a disparity map can allow automation of the processing of stereo images, it does this in a way that is not technically efficient since the data processing burden is not improved. It should also be remembered that with the ever rising resolution that imagery is required to work at, the disparity map will get correspondingly larger. The resolution of a disparity map is the same as the resolution of the image. At 4K resolution the disparity map of one frame may occupy 50 Mbytes or more. The amount of data involved in using this technique for a whole movie, or significant parts of a movie, is therefore enormous. Thus, there are still significant problems to address in this field.