The instant disclosure related to a system of video frame detector for video content identification of multi-object entering and leaving the frame detector.
When surveillance videos are used, video content identification is the most important task.
With overwhelmingly superior accuracy due to the advancement of convolutional neural networks, visual search, object detection, object localization and content tagging/indexing over video becomes an extremely computationally-heavy task which relies largely on dedicated Graphic Processing Units (GPUs). GPU servers are an expensive investment and should be used as efficiently as possible.
Due to advancements in camera technology, surveillance videos are often recorded at 30 or even 60 frames per second (fps) at a resolution of 1080 p and above, resulting in many frame redundancies in the videos being processed by the GPU. Furthermore, due to form-factor and power consumption constraints, running video content identification on embedded devices' CPU (with or without GPU), if not impossible, often becomes an extremely inefficient task in terms of speed and latency, by skipping redundant frames in which no object-of-interest is leaving or entering the video (i.e., no new object-of-interest is present in the scene), the CPU and/or GPUs' workload can be drastically reduced while maintaining high identification accuracy afterwards.
Existing approaches either measure the object change by calculating the frame difference, or simply apply a fixed frame-skipping interval to skip the same object that appears at multiple frames, and frame difference detection is based on scene and object's motion, which is a camera-independent, global measurement; and is not capable of capturing the motion pattern for each individual object, and it is desirable to skip those frames in which an object enters the scene but stays stationary for a very long period, or an object moves along the entire scene after the system already cause its appearance, motion detection would fail in the latter case as it cannot reasoning what causes the motion, not to mention if the motion being triggered by multiple objects simultaneously, multi-object tracking may be a viable solution; however, due to its complexity of maintaining a buffer for associating hypotheses, its computation overhead is enormous and sometimes can be an overkill to the frame skipping task, and frame-by-frame skipping can be too brutal to maintain reasonably accuracy of object detection and far from stable as it does not rely on any scene specific information.
Therefore, systems and methods of a video frame detector for video content identification of multi-object entering and leaving the frame detector has been disclosed.