With the widespread prevalence of mobile devices, mobile video capture has become an inseparable part of everyday lives. For many users, it is hard to hold a mobile camera steady, and consequently the captured videos are usually shaky. Thus, there is a need for robust real-time video stabilizers on mobile devices.
Conventional video stabilizers can be characterized as hardware-assisted software-based approaches, and purely software-based approaches.
Hardware-assisted software-based approaches rely on knowledge about the mobile device's camera (camera priors). For example, information about the camera's dependent inter-frame delay, the intrinsic camera matrix, and the calibrated inertial sensors may be needed. However, due to gyroscopic drift and sensor noises, camera translations computed from the mobile device's inertial sensors are prone to error, while the assumption of pure camera rotation is unrealistic for videos such as videos of non-planar scenes. In addition, the requirement of dedicated calibration is impractical for some users.
Without knowledge or assumptions of camera priors, purely software-based approaches post-process a video in three main steps: (1) global motion estimation (GME), (2) camera path optimization, and (3) frame synthesis. In GME, the parametric camera motion between consecutive frames is estimated based on visual appearance. Camera path optimization is responsible for removing unwanted vibration in camera motion while preserving intentional camera movement; an optimal intended smooth camera trajectory is estimated and high-frequency fluctuations are removed. In frame synthesis, a stabilized video is synthesized by warping the original frames based on the estimated smooth trajectory. Earlier work applied low-pass filters to remove high-frequency motion. Recently, an L1-norm optimization has been used to generate a camera path that follows cinematography rules.
There are applications, such as video conferencing and video surveillance, in which it is preferable for the video sequence to be stabilized during capture instead of post-processing it after capture. If the video stabilizer is supposed to show the processed video on-the-fly, then the camera path optimization has to be done in a streaming manner. That is, the optimizer scans each input video frame only once, which may be referred to as “one-pass” processing.
There are a number of difficulties associated with camera path optimization in video stabilization, and one-pass optimization in particular. First, the output of GME is often noisy due to factors including occlusion or the lack of feature points, etc., in the input video. Such noises can affect the estimation of camera intentional motion, and thus impact the stabilization performance. Second, a one-pass camera path optimizer only has access to a local window of video frames at a time, and it can only scan each frame once. Thus, compared to a multi-pass version, a one-pass optimizer does not have the global level information about the entire camera motion trajectory and therefore has to rely on limited information about local motion to estimate intentional camera motion. Third, one-pass optimization is often required for real-time applications running on mobile hardware platforms, where complexity and memory issues prevent the use of effective but complicated algorithms in video stabilization.
Conventional software-based approaches generally do not perform satisfactorily to stabilize videos in real time. Except for motion filtering methods, conventional camera path planning approaches need to have the whole camera trajectory estimated and therefore rely on two-pass processing. Second, in many cases, robust feature tracks cannot be obtained due to rapid camera motion, occlusions, etc. High-quality feature matching such as those relying on SIFT/SURF (scale invariant feature transform/speeded up robust features) matching are not realistic for mobile devices because of the devices' limited memory and computational power. For the same reason, methods that rely on extra motion editing (e.g., inpainting) or expensive optimization are not suitable for real-time processing of videos, particularly high definition videos. Third, conventional real-time motion filtering methods utilize scene-dependent parameter tuning. For example, aggressive filtering provides a more stabilized camera path but larger out-of-bound areas, while mild filtering provides less stabilization but a larger output. Many users do not have the knowledge or interest in such parameter tuning, and would prefer automatic settings that produce the highest quality for stabilization.