Multicamera arrays, lightfield/plenoptic cameras, and cameras using stereo imaging have recently become viable commercial products. Currently, three-dimensional (3D) user experience provided by such devices are typically based on photography effects such as image refocusing, image segmentation, image depth layer effects, image view interpolation, or the like. However, 3D video effects are typically limited. One hindrance to providing 3D video effects is the difficulty in tracking regions of interest across video frames.
In typical 3D videography usage scenarios, the only information provided for tracking a region or object of interest is the selection of the region or object of interest by a user. For example, the user may click an initial object of interest (e.g., an object, person, or the like) in a first frame of the video and the task of the object of interest tracking is to estimate the position of the object of interest across the video frames. Such tracking of a prior unknown object may be characterized as model free object tracking, which may not require any domain specific knowledge or training. Typically, model free object tracking approaches assume the object being tracked does not change in appearance across video frames. In some instances, model free object tracking techniques may fail due to challenges associated with various appearance changes across frames due to changes in the object of interest and/or camera pose and changes in global or local illumination. Furthermore, since the object of interest is not known beforehand, it is impossible to employ offline machine learning techniques to account for the variability of the appearance of the object of interest due to such changes. Online learning techniques may be applied in the context of model free object tracking to adapt an object model to changes in the appearance of the object. However, updating the object model may introduce errors and such techniques may not provide reliable object tracking.
Current techniques may be inadequate for reliably tracking objects or regions of interest across video frames. It is with respect to these and other considerations that the present improvements have been needed. Such improvements may become critical as the desire to provide 3D video effects in a variety of contexts becomes more widespread.