Understanding human activities from video is a fundamental problem in computer vision. With the advent of mobile video cameras that may be carried or worn by a user or placed on transportation devices, there is significant interest in the development of devices with egocentric video capture capabilities. Examples of such cameras include those that are integrated into smartphones, those incorporated into wearable devices such as glasses and goggles, body cameras such as those used for law enforcement applications, or cameras that may be mounted on headgear, bicycles, cars and trucks, or other moveable objects.
As the user of a wearable or mountable device moves, the user may want to keep track of the location of a particular object as the object moves into and out of the camera's field of view. However, processing each frame of the video to identify the object can be computationally intensive. Additional issues can occur when the user moves such that the object is no longer within the device's field of view.
This document describes methods and devices that are directed to solving at least some of the issues described above.