An enormous amount of visual data, in the form of videos and images, may be collected at Fulfillment Centers (FCs), warehouses and other corporate locations. The volume of visual data makes the task of finding relevant information overwhelming for users.
In existing visual content analysis systems, the triggering and registration of events is generally based on a manual definition and setting of rules and using low-level descriptors for detection and recognition of events or entities that defy or follow those rules. The retrieval of video data is based on querying by examples and measuring the similarity between stored exemplars (video or image descriptors) and low-level descriptors that are automatically extracted from the video or image data. However, generic low-level descriptors are often insufficient to discriminate content robustly and reliably at a conceptual level.