1. Field of the Invention
The present invention relates generally to systems and processes for image editing and, more particularly, to a system and process for replacing unwanted geometry with new geometry utilized in a 3-dimensional (“3-D”) tracking process.
2. Description of Related Art
Media productions have benefited in recent years from technical advances in animation and in computer generated images. Increasingly, producers and directors of media productions are creating scenes comprising a combination of real and computer generated images that appear to be interacting with each other and co-existing within the same real or virtual space. These new techniques can create realistic special effects such as computer generated dinosaurs or mice interacting with real people, or the destruction of recognizable cities by computer generated asteroids.
These new techniques are used in media productions such as motion pictures, television shows, television commercials, videos, multimedia CD-ROMs, web productions for the Internet/intranet, and the like. The process involves at least three general phases:    1) scene recording or production    2) camera and environment 3-D solving    3) compositing and other 2D image manipulation phases.
The first phase creates and captures the actual media images (i.e., live action footage, animation, computer graphics) used in the finished piece. Live action footage may be recorded, for example, in media formats such as film, videotape, and audiotape, or in the form of live media such as a broadcast video feed. The media information is captured through devices like cameras and microphones from the physical world of actual human actors, physical models and sets. Computer generated images (CGI), such as computer graphics, computer animation, and synthesized music and sounds, may be created by using computers and related electronic devices to synthetically model, generate and manipulate images and sounds, typically under the guidance and control of a human operator.
The next phase of 3-D solving re-creates the scene recorded in the first phase within the computer as a virtual 3-D environment and associated camera motion(s) and attributes. The identical recreation of the original scene within the computer allows computer graphic images to be generated (rendered) as if they were real objects shot by the original production camera. These rendered images can include computer graphic characters such as dinosaurs, a computer graphics mouse, computer graphics building, or whatever is required based upon the scene. Additionally, the rendering process can create additional elements and aspects of the basic computer graphics image such as shadows, reflections, and any other conceivable artifact that would have been present if the computer graphics character were actually a real object photographed in the original real, production environment.
The third phase uses compositing techniques which combine, integrate, and assemble these real (production) and computer generated (rendered) images, which may have been produced out of sequence and through various methods, into a coherent finished product, using operations such as editing, compositing and mixing. This process should result in real and computer generated images that appear to co-exist and interact as if they were captured or created at the same time, in the same space, from the same viewpoint. In post-production, the images are combined (or “composited”) to generate believable results by adjusting the visual characteristics of the rendered object as well as its associated rendered artifacts (such as the shadows) to match the original production scene. These adjustments may include manipulating colors, brightness, gamma, size, the appearance of film grain to match the original image if it were shot on film, and many other visual attributes as appropriate for that particular scene and its medium. The compositing phase may also fix some or all errors, objectionable image aspects, and/or other visual problems introduced during independent and often very separate production steps.
Recording the original production scene sometimes requires special shooting conditions such as bluescreen photography, and/or the scene may require special or unusual sets, equipment, and foreign objects which may be present in the scene's original recording. However, some or all of the equipment, objects, and/or conditions are not intended to appear in the final product. Nevertheless, they may be necessary in order to produce the scene. As examples, they may be necessary for specialized or particular lighting requirements, safety concerns, proper execution of the final special effects, and other similar considerations.
One of the difficulties of combining and integrating computer graphics images into live action scenes occurs during the second step of 3-D solving. This phase attempts to solve and match all of the necessary three dimensional characteristics (such as positioning and movement) of the live action scenes. Even though the physical set of a live/recorded production is inherently 3-D, the recorded result is a 2-dimensional (“2-D”) image from the camera's perspective. It is therefore very difficult to recreate and match the 3-D positioning and movement of the CGI to the recorded live action scene. Human visual acuity is sufficiently precise to detect anomalies in the relative scale, positioning, and dimensional relationship of the CGI to the live action scene. These relative characteristics must be accurately matched to obtain realism and present the viewer with a seamless view of the composite image.
Thus, it is advantageous to have a 3-D model of the live action scene to assist in integrating the CGI into the live action scene. One method for generating such a 3-D model is commonly referred to as 3-D tracking (also referred to in the present disclosure as “3-D matchmove” or “3-D solve”).
To assist in the 3-D tracking process, several tracking markers or objects may be placed within the scene that is to be recorded even though they are considered foreign objects relative to the expected final image. These markers assist in determining the 3-D coordinates of the camera motion and other camera-related parameters, as well as creating a 3-D recreation of the objects in the scene and their 3-D spatial relationship to each other. Tracking markers can be sticker dots, tennis balls, painted lines, or other markers or objects that will be discernible in the recorded image of the scene. The tracking markers are usually placed on features within the scene and are specifically placed and designed to stand out in the recorded image and assist in the 2D tracking and 3-D solving process. In general, the more tracking markers there are within the scene, the more accurate will be the 3-D solve.
The scene is then recorded. The recorded scene is then scanned or similarly imported into a computer graphics program used for 2-D point or 2-D feature tracking (such as Combustion or Composer) or into a 3-D graphics scene recreation and solving application which includes 2-D point or 2-D feature tracking (such as 3D Equalizer or Matchmover). The tracking markers within the scene may then be tracked in 2-D screen space. A 3-D graphics application (such as 3D Equalizer or Matchmover) may then use tracking algorithms known in the art to mathematically convert (i.e., solve) the 2-D tracking information of the recorded scene into 3-D coordinates of the scene (a “3-D map”).
Alternatively, human computer users can use their knowledge to manually try to solve the necessary recreation of the camera and the environment. Whether the information and 3-D map is gathered through a special application and solving algorithm or through manual means, this 3-D map of the original production scene in the computer's virtual space is used to assist in rendering the desired computer graphics images.
Once the 2D screen space motions have been converted, or solved, into a 3-D map, the 3-D map may be exported from the 3-D graphics solving application (or from the manual steps) to a 3-D animation software package (such as Maya, 3-D Studio Max, or Softimage). The 3-D map may be used within the animation software package to create and assist in rendering the computer graphics images which then appear as if they were present during the original production scene. By applying the final phase of post-production editing, compositing and mixing, these computer graphic images seamlessly integrate into the live action scene, for example, placing a computer generated mouse on a table along with its associated attributes such as shadows or reflections. Thus, the tracking markers assist in creating and integrating the desired realistic special effects.
However, once the scene is recorded, the presence of the tracking markers in the scene becomes a problem along with any and all other foreign objects as described above. Even if the scene is shot in a normal environment such as a room or outdoor location and there are no other objectionable objects such as lighting equipment, the tracking markers are a problem which must be corrected by removing them from the original image. This is because the tracking markers tend to stand out in the scene as odd and seemingly random objects that shift from one shot to the next. Therefore, to preserve the sought after realism, the tracking markers must be replaced in the scene in the post-production process by altering the image in some way.
There is one process which can eliminate the need for tracking markers in the original production photography and still potentially give the same benefits. This process uses specialized camera equipment which is driven by high precision motors which are controlled by a computer and special camera crew and computer operators. This technology is generally referred to as motion control, and it allows the camera to go through the exact same series of motions more than one time.
This allows a scene to be shot once normally with the actors present but without the tracking markers. The camera can then be reset and run through the same scene again (oftentimes referred to as a second pass or reference pass) without the actors but with the tracking markers present. Since the two scenes have exactly the same camera move, the 3-D solving software can use the second reference pass to perform the 3-D camera and environment solving. The resulting 3-D map can then be applied to the first scene with the actors. This process gives all the advantages of the markers or other specialized equipment, and also allows the scene with the actors to be used without any objectionable objects being present, thus eliminating the need to replace such objects.
Unfortunately, there are a number of problems and practical considerations which make the motion control solution undesirable, impractical, too expensive, or even impossible. Some of the factors which make the motion control pass solution unacceptable include: noise from the motors which can cause problems for sound recording; the need for much more expensive and complicated camera equipment and a specialized crew to run the equipment; the motors moving the camera have speed limits which can present problems compared to normally operated camera motions; the setup time and time to shoot the extra passes greatly impact the production schedule and further raises costs and concerns; and the equipment itself is not foolproof. Furthermore, if the passes are not absolutely identical, the second reference pass with the markers is useless and the special effects may therefore become enormously more complex without their presence to assist in the 3-D solve, thereby defeating any potential saving or other consideration gained through the use of motion control.
Because of these problems and the additional expense and complexity of motion control, the solution is considered too expensive and potentially faulty compared to the use of markers and other equipment which can be replaced through standard post-production special effects methods.
One such standard method for removing the tracking markers is to use 2-D tracking algorithms in combination with a digital painting or compositing application (such as Matador or Photoshop) to digitally replace the portion of the image with the tracking markers (the “target”) with a replacement portion from a “source” which appears to be a part of the scene. For example, if a tracking marker dot is placed on a table within a scene to assist in a 3-D solve of the scene, the tracking marker dot needs to be replaced with a suitable and acceptable replacement portion which may be copied from a source that appears to be a part of the original table. The source replacement portion may then be pasted over the target tracking marker. This may result in a scene in which the tracking marker dot has been replaced with new image pixel data and the table appears normal.
Both the target and source may comprise a number of pixels within a given frame. The source of the replacement pixels can come from a portion of the same image, an image earlier or later in the sequence, from a separate scene, from special reference pictures, or any other suitable source. The most efficient and desirable process uses pixels from within the image itself. In order to replace the number of pixels that make up the target, a sufficient number of pixels within the frame must be chosen as the source. The source pixels must then be copied and pasted over the target pixels in order to replace them in the scene.
For example, in the table example discussed above, a portion of the table adjacent to the target may be chosen as a source. This is because the adjacent portion usually has similar characteristics (i.e., color, shading, textural appearance) and thus appears to be a part of the original image. Moreover, it is sometimes advantageous to determine more than one replacement portion to be associated with a particular tracking marker dot. This is because one may be a better replacement portion than the other depending on such factors as lighting, shadows, and distinctness (for ease of tracking) from frame-to-frame. Thus, a particular tracking marker dot may be replaced by either one or several replacement portions associated with that tracking marker dot.
If the camera move (i.e., the motion of the camera as the scene is recorded) is relatively simple, a tracking marker replacement process may also be relatively simple, because the relationship between the tracking markers and the replacement portions, as well as the size of each, may stay relatively constant from frame-to-frame. Thus, in the case of a simple camera move, the replacement process may be aided to some degree by a 2-D tracking program known in the art which may automatically track the tracking markers from frame-to-frame. Once the relationship between the tracking markers and the replacement portions is determined from one frame, this information is provided as an input to the 2-D tracking program. The program may then use this information to automate the replacement process in subsequent frames of the scene requiring marker replacement.
However, even when using a 2-D tracking program there may still be a considerable number of manual steps involved in order to overlay (i.e.,replace) the original tracking marker dots in a satisfactory manner. This may result from slight frame-to-frame variations in the relationship between the tracking markers and the replacement portions. These variations may result in portions of the tracking markers not being fully overlaid by the replacement portions.
Furthermore, it is usually important that the characteristics of the replacement portions used to overlay a particular tracking marker be consistent from frame-to-frame. If they are not consistent, negative effects such as flicker, inconsistent color change, or wobble may occur in that portion of the scene where the tracking marker was overlaid. Such effects may be discernible by a viewer and are usually unacceptable. Commonly, it is difficult to perform 2-D tracking on the source because the source may not stand out in the scene in the same manner as the tracking marker dot or marker whose characteristics are specially designed to make tracking easy and accurate. “For example, a tennis ball as a tracking marker on a large grassy field is easy to track exactly because of the distinct differences in geometry and color. However trying to track any specific patch of grass alone is extremely difficult since one area will look virtually the same as another as a scene progresses. The high precision tracking necessary for realistic computer graphics effects may become extraordinarily difficult or nearly impossible in this or similar types of situations. This is because accurate tracking requires discernible differences to distinguish one area from another, and either human or computer-assisted tracking will have great difficulty in tracking one specific grassy area because of the lack of trackable differences in the grassy field.” Thus, it is often difficult to accurately and efficiently track the source and maintain this consistency when employing a 2-D tracking program because unlike in the case of the tracking marker there is usually no consistent frame-to-frame tracking point which the 2-D tracking program can utilize to track the replacement portions.
The replacement process may be further complicated in several ways. For example, if the tracking markers are placed on a moving object within the scene, for example
a tumbling automobile, the replacement process becomes much more complex. This is because the larger motions and changes in perspective due to the tumbling motion result in a constantly changing relationship between the sources and the targets from frame-to-frame.
Furthermore, the process is more complicated when the camera move is not a simple one. More complex camera moves in relation to the recorded object may result in changes in the 2-D spatial relationship of the sources to the targets from frame-to-frame, as well as possibly changing the size of the sources and targets. Also, additional complexity may be introduced if the object is moving at the same time that the camera is moving. Many camera moves introduce the same problem as described above for a tumbling object, i.e., the 2-D spatial relationship between the target and the source, as well as the size of both, changes from frame-to-frame. This makes consistent copying of the source pixels over the target pixels very difficult in a automated 2-D tracking process.
Therefore, when it is difficult to automatically perform a consistent replacement using the 2-D tracking software, manual copy and paste procedures may be necessary in order to overlay the original tracking marker dots in a satisfactory manner. Software tools within a compositing application may be used to manually copy and paste replacement portions containing suitable imagery over the tracking marker dots on a frame-by-frame basis. Because “painting” is the process of manually adjusting or replacing individual pixels within individual frames, this process involves manual work on each of the affected frames of the scene at the level of individual pixels.
Therefore, with the current state of the art, a filmmaker must balance the desirability of placing larger numbers of tracking markers within a scene in order to achieve a more accurate 3-D model against the difficulties involved in removing the tracking markers once the scene is recorded. The filmmaker often decides to limit the number of tracking markers in order to simplify the post-production process. As a result, the CGI artist must work with a less accurate 3-D model.
Although the previous example focused on tracking markers, as mentioned earlier, special effects scenes oftentimes contain many foreign objects, such as stage rigging, safety equipment, incomplete buildings, and other portions of the scene which are undesirable, and which must be replaced through some technique. Previously these techniques generally followed the standard painting and replacing techniques described above. The process described above to replace tracking markers is also applicable to many of the replacement needs for other types of geometry and objects undesirable in the final scene.
Thus, there is an industry demand for an accelerated and simplified post-production process for removing unwanted geometry and objects from recorded scenes. In that regard, there is an industry need for a system and process for automatically and accurately replacing unwanted geometry and objects with replacement portions in scenes where the 2-D spatial relationship between the unwanted geometry and objects and the replacement portions, as well as the size of each, do not remain constant from frame-to-frame.