Modern computing systems such as mobile devices enable video editing on an unprecedented scale. Mobile devices can record video and edit or segment the video into clips. For example, a mobile device can record video of multiple scenes, panning from scene to scene. The recorded video can be segmented and further edited at a later time. A computing device may be used to segment a video in order to re-arrange different scenes in the video or to remove scenes from the final video.
Performing video segmentation manually is cumbersome, especially with large videos. Manual segmentation involves a workflow that often requires watching the entire video and adding markers that indicates a start of a segment to every frame.
But existing solutions for automatic video segmentation, typically based on histogram analysis, cannot reliably and accurately segment video. These solutions create a histogram for each frame of video and analyze how the histogram changes over between frames. Histograms can be based on the tonal distribution or color distribution, e.g., the number of pixels with a given tone or color. Because histogram-based analysis fails to analyze the contents of the video, such analysis can result in lower quality of the final edited video because video can be over- or under-segmented and therefore require manual intervention.
For example, histogram-based video segmentation may miss subtle distinctions that should be categorized as different scenes. Automatic video segmentation may not be able to detect some scene changes, thereby forcing the user to manually segment the video. Additionally, automatic scene detection may over-aggressively segment a video, creating duplicate segments for one scene. For example, histogram-based algorithms may be inadvertently triggered by an object or a face in a scene and erroneously detect multiple scenes when only one scene exists.
Accordingly, existing solutions fail to effectively segment video content for reasons such as (but not limited to) those described above.