In applications ranging from aerial vehicles and ground robots to emergency-personnel navigation and augmented-reality for handheld devices, accurate 3D pose estimates are crucial. The current focus is on vision-aided inertial navigation methods that provide estimates by fusing measurements from a camera and an inertial measurement unit (IMU). In recent years, several algorithms of this kind have been proposed, tailored for different applications. For instance, if features with known coordinates are available, map-based localization algorithms can be used to provide absolute-pose estimates. In an unknown environment, simultaneous localization and mapping (SLAM) methods can be used for jointly estimating 3D motion and the positions of visual landmarks. Finally, if estimates for the vehicle's motion are needed but no map building is required, visual-inertial odometry methods can be employed.
For systems using estimation algorithms to perform well in any of these cases, both the spatial and the temporal relationship between the IMU and camera data streams must be accurately modeled. The first of these problems, often termed extrinsic sensor calibration, has been addressed previously. The transformation (i.e., rotation and translation) between the camera and IMU frames can be estimated either via an offline calibration process with a known calibration pattern, or online along with the IMU trajectory. However, the problem of temporal calibration between the data streams of the camera and the IMU has largely been left unexplored.
To enable the processing of the sensor measurements in an estimator, a timestamp is typically obtained for each camera image and IMU sample. This timestamp is taken either from the sensor itself, or from the OS of the computer receiving the data. These timestamps, however, are typically inaccurate. Specifically, due to the time needed for data transfer, sensor latency, and operating-system overhead, a delay that is different for each sensor, exists between the actual sampling of a measurement and the generation of its timestamp. Additionally, if different clocks are used for time-stamping (e.g., on different sensors), these clocks may suffer from clock skew. As a result, an unknown time offset typically exists between the timestamps of the camera and the IMU. If this time offset is not estimated and accounted for, it will introduce unmodeled errors in the estimation process, and reduce its accuracy.
With the exception of the work described in previous literature on offline vision aided inertial navigation, the prior art has not addressed the problem of estimating time offsets online. Currently, algorithm and system developers either determine this offset using hardware-specific knowledge, or develop offline methods for estimating the time offset on a case-by-case basis, or assume that the time offset is sufficiently small so that it can be ignored. However, these solutions are not general enough, and in the case where the time offset is varying over time (e.g., due to clock skew), can lead to eventual failure of the estimator.
Therefore, there is a need for a 3-D motion estimation and online temporal calibration system for a camera inertial measurement unit that overcomes the limitations of the prior art.