Wearable devices are being introduced by various companies and are becoming more popular in what the wearable devices can do. One example of a wearable device is a head-mounted video device such as Google Glass®.
A critical capability with wearable devices, such as the head-mounted video device, is detecting a region of interest in video or imagery of a scene in real-time as a given activity is proceeding. As the population moves from traditional environmental cameras to mobile and wearable cameras, it becomes important to consider not only the accuracy of the method, but also the power and computing resource usage since the wearable devices may have very limited processing and computing resources. For example, the wearable devices are much smaller than traditional laptop computers and desktop computers and do not have room to accommodate high-powered processors and a large amount of memory.
Some current methods that are used to detect a region of interest use anticipated shapes to detect a hand gesture. For example, the method may look to see if the image contains any shapes that match a predefined library of shapes. However, if the shape is not in the predefined library then the region of interest may not be detected. In cases in which the shape of the region of interest does not coincide with a shape in the predefined library, the region of interest may not be detected. Moreover, such methods are computationally expensive due to the cost of sliding-window-based template matching, and therefore, are not suitable for wearable computing where power consumption is of critical concern. Furthermore, some scenarios require the selection of regions of interest that extend beyond the field of view of the device. In these cases, no predefined library of static shapes will support selection of the region of interest. Methods for dynamic hand-gesture-based region of interest localization are desirable because they are not limited to specific shapes or enclosures, and they support the localization of regions of interest that extend beyond the field of view of the camera.