Because most individuals now carry a smart phone, people take more photographs and videos than ever before. A photograph is a static image that captures a scene at one particular instant. A video, on the other hand, is a collection of multiple images that dynamically capture a scene over some elapsed time period. Each video includes numerous individual frames that are played in sequence, with each individual frame being analogous to a single photograph. A video therefore includes significantly more visual data than a single photograph.
Over the last decade or so, many different image editing capabilities have been developed to manipulate photographs using automated tools, such as those provided by applications executing on a desktop computer or smart phone. Example photographic manipulations include: image in□painting, completion, or hole□filling; image perspective correction or rectification; image beautification; image segmentation; and so forth. The results of some of these types of manipulations depend on combining visual data from two different photographs. For instance, one photograph that has a blurry portion of some scene can be improved by replacing the blurry portion with an in-focus portion extracted from another photograph of the same scene. A specific example of photographic manipulation is called time slice photography in which multiple photographs of the same place are taken at different times. The photographs from different times are then combined into a single image by taking a slice from each photograph and merging the slices into a new image. More generally, automated image editing tools can determine matching boundaries between portions of two different photographs and can stitch the portions of the two different photographs together in a manner that appears fairly seamless to the human eye.
Thus, conventional image editing tools enable the automated manipulation of two photographs in which the manipulation involves combining different portions of the photographs into one resulting image. Unfortunately, such automated tools are lacking for video manipulation because combining visual data from two different videos is a more difficult task. Videos, with their sequences of numerous frames, have more visual data and offer a more dynamic and complex view on our three-dimensional (3D) world as compared to the static images of photographs. Consequently, the automated tools and techniques used to manipulate photographs cannot be extended to videos in a straightforward manner. As a result, the image manipulations that are currently applied to videos are primarily manual-based. Furthermore, conventional video manipulations are time consuming, expensive, and essentially limited to non-moving cameras.