Users frequently need to track or identify objects in a video. For example, there is often a need to release law enforcement or other sensitive videos to the public. This is problematic, however, when that video includes objects that identify personal information, such as a person's face, a license plate, a house address, and so forth. Accordingly, before that video is released to the public, editors redact (e.g., blocked, blurred, pixelated) sensitive objects from the video.
Existing methods for redacting objects from video generally utilize an iterative approach to identify the boundaries of the object in each frame of the video. There are, however, many problems with this method of tracking objects in video. For example, identifying object boundaries in each frame of the video typically requires that the entirety of each frame must undergo image analysis. This type of image analysis is generally computationally expensive and time intensive.
Additionally, existing methods of tracking objects track the object as it moves in time through the video. Tracking an image though a video can cause a tracking system to be easily confused when the camera angle of the video changes or when the object within the video moves. For example, if the camera capturing the video is moved from a head-on shot of the object to a side angle, the shape of the object in the video becomes skewed. Existing methods for tracking and redacting objects in videos may easily lose track of the object when the object's shape changes from one video frame to another. Other conditions that cause problems with conventional tracking methods include significant object and camera motion, other moving objects, change in the object's appearance due to lighting changes, motion blur, and deformation, and period of lost track due to occlusion of the object by other objects or due to the object momentarily being out of the video frame.
Similarly, existing methods tend to experience “drift” when attempting to track an object through a video. For example, if a user indicates a person's face as the object to be redacted from a video, existing methods are easily confused when other additional faces are shown in close proximity with the face that is meant to be redacted from the video. In particular, existing methods can cause the boundaries of the desired face to drift to another face, which the system then tracks instead of the desired face. Accordingly, over time, existing methods often lose the object they are meant to be tracking. This is especially true for longer videos.