Prevalent security and surveillance systems typically enable the monitoring and capture of video images of an area of interest to the entities implementing the systems. Over the course of some time period, such as a day, a week, or a month, such security and surveillance systems may capture significant amounts of video data, which is typically too great for any individual human, or multiple humans, to meaningfully review for the purposes of detecting events of interest including security events. Often such review is merely reactive to a security event or other event that has occurred in the past. While in some instances this type of review may be useful in resolving or addressing less time-sensitive security events or the like, this type of review is merely reactive and the time lost reviewing the video image data can adversely impact obtaining a desired result for time-sensitive security events by the entity implementing the security system.
However, even in real-time (or near real-time) monitoring and review of video images streaming from several surveillance video cameras of the security and surveillance systems can be extremely difficult for human detection of events of interest. Because in most circumstances, defined spaces that are under surveillance via the security and surveillance systems incorporate multiple video cameras and thus, providing multiple video feeds to a security control center or the like than there are security personnel to monitor and review the video feeds. Thus, in a real-time monitoring and surveilling situation, many events of interests, including security events, are missed and thus, compromising the security and/or safety of the defined space(s) and/or the subjects (e.g., persons, protected products, etc.) within the defined space.
Additionally, some modern video analysis techniques may implement computer vision technology that enables automatic detection of objects in video data by a machine rather than relying on a human. In these implementations, the video analysis technique may include a specific detector that may be implemented for identifying a category of object (e.g., instance level detection). within video data. In more advanced implementations, for a single computer vision task, such as object detection, pose estimation, or scene segmentation, a general model for the single computer vision task may be implemented for accomplishing the discrete computer vision tasks. While such implementations may function to enable automated detections within video data, the discrete detection and analysis method fails to provide comprehensible and actionable detections.
Thus, there is a need in the computer vision and security fields to create a new and useful image data analysis and event detection system for intelligently detecting events of interest and providing a comprehensive interpretation of the detected events. The embodiments of the present application provide such new and useful systems and methods.