The recent resurgence of augmented reality and virtual reality (AR/VR) has provided challenging computer vision problems of drastically higher requirements. For example, recent state-of-the-art low-latency rendering techniques assume that very high-frequency (>30 kHz) tracking is available. On the other hand, most commercial AR/VR systems provide only rotational tracking, limiting the full immersive experience, while other systems provide full six degrees of freedom (6-DoF) tracking (i.e. rotation and translation tracking) while working at ˜1 kHz frequencies. This full 6-DoF tracking comes at a cost where external lighthouses or infrared cameras are needed, which restricts the portability of the device compared to entirely on-device (“inside-out”) tracking solutions. It is well understood that high tracking rates are critical for AR/VR systems for immersive experiences, yet there currently exists no practical, portable system that comes close to next-generation tracking performance.
Modern imaging devices have much potential for inside-out camera-based device tracking. For example, the ubiquitous presence of rolling shutter (RS) cameras in almost every cell-phone and low-cost camera affords developers opportunities to attempt to leverage RS motion capture for a number of applications, including AR/VR. In rolling shutter capture, each row (or column) is captured at a slightly different time, a rapid process that cheaply approximates a global image exposure. However, researchers have previously strived to remove the effects of this non-simultaneous exposure of the image rows, as it results in noticeable artifacts if the camera is moved during the exposure. In many computer vision applications, other imaging properties are considered negative, as well: Prominently, radial lens distortion is typically corrected for after image formation. Camera-based tracking approaches have historically adopted the same attitudes towards rolling shutter and radial distortion (e.g., radial (lens) distortion), treating these image features as camera artifacts that have to be overcome and/or corrected for in order to apply typical camera models, e.g. pinhole camera model, and scene reconstruction methods.
Further, many of the commercial AR/VR systems rely on camera-based tracking but, for cost reasons, use standard cameras with frames rates of a few tens of frames per second (e.g. a 60 hertz (Hz) external infrared camera). In order to achieve the required higher frame rates for the tracking they revert to leveraging gyroscope data to improve the tracking frame rate. However, positional tracking still remains difficult to achieve, as inertial measurement units (IMUs) drift. Even using high-frame-rate global shutter cameras is not the solution, as the high frame rates required lead to a decrease in the maximum possible exposure time (equal to the inverse of the kHz frame rate), which makes capturing sufficient light impractical, especially indoors where AR/VR systems are frequently used. In addition, most of these systems are head-worn and require small-form-factor cameras, thereby limiting the amount of light captured even further.