With the prevalence of digital video recording apparatuses, a technique for deriving information included in a video more effectively by recording environmental information obtained from various sensors, synchronized with the video.
For example, the position and azimuth angle information of a digital video recording apparatus (for example, a video camera) are recorded together with video and the recorded position and azimuth angle are checked against geographic information at retrieval, whereby the information of an object in the video can be extracted. This enables more detailed construction recording by replacing a construction photograph for the purpose of supervising a construction site with a video image with the video recording position and azimuth angle. In addition, in a movable body such as a car, a driving recorder for recording driving states is sometimes used to record a position, an azimuth angle, and the like together with the video in order to use them for the cause unfolding at the time of an accident. Furthermore, as for a portable terminal such as a cellular phone, the combination of the video and a position and/or an azimuth angle is effective when a person carrying the portable terminal records the history of behavior with the video or uses it as a human navigation system.
The digital video recording apparatus for realizing the aforementioned system can be accomplished by combining a global positioning system (GPS) with an electronic compass with a digital video camera. In addition, the conventional technique for imparting metadata such as the position or azimuth angle to the video is disclosed, for example, in JP-A-H10-42282 and JP-A-2000-331019.
JP-A-H-10-42282 discloses a technique for imparting a viewpoint and an azimuth angle obtained by using a gyro sensor, a GPS sensor, or the like as metadata at the time of the video recording in a system for presenting image generated by superimposing the geographic information onto the shot image.
Moreover, JP-A-2000-331019 discloses a technique for automatically imparting an index to a landscape image. More specifically, an input control processor inputs video information data of the pertinent place from a video information database. A reference model calculation processor generates a reference model from geographic information using camera parameters with some flexibility when shooting the previous frame. An image matching processor extracts outline information from the image of a processed image frame and compares it with the reference model, thereby selects the most appropriate model and then calculates camera parameters. An indexing processor calculates an object area in the frame by projecting geographic information to an image coordinate system by using the camera parameters and then imparts attribute data to constitute the object, to the object. A texture extraction processor obtains texture information on the object as geographic information by obtaining the image of the calculated object. This enables presenting what is in the image or retrieving a desired image by automatically indexing the object in the frame by using the geographic information for a time-series landscape image accumulated or acquired in real time.
Various sensors, however, may indicate incorrect values due to an operating environment or the like at the time of recording in some cases. For example, 2-axis geomagnetic electronic compass as an azimuth sensor, which is widely used at present, is influenced by earth magnetism in the vertical direction. Therefore, in some cases, it may indicate incorrect azimuth angle information due to a vertical motion of a camera. Moreover, a GPS for measuring position information sometimes causes an error in measurement position due to an occurrence of a multipath or the like. When such sensor information is incorrect at presenting the video, information on the video cannot be correctly presented to a user, thereby giving rise to the confusion problematically. This problem is not considered in the first conventional technique.
Moreover, the second conventional technique described above has a problem in that it denies the use of sensor information and that it is excessively dependent on information obtained from the video. Originally, various sensors themselves exist to measure their target, and their accuracies are higher than information obtained from video. In other words, imparting video metadata by sensors is basically effective, while errors in the video metadata have a serious influence on the service performance based on the video metadata. In the real world, it is hard to always record correct values by sensors and therefore there is a need for a technique to detect incorrectly recorded parts and to correct the errors.