1. Field
This disclosure relates to a system for performing video content analysis (VCA) using depth information.
2. Background
In a video content analysis (VCA) system, video streams are analyzed to identify and classify objects, and to determine physical and temporal attributes of the objects. As a result, a log of analytics data may be stored. The analytics data may be used to determine events that occur in the real world, to aid in searching for objects or detected events, and for other purposes. An example of a VCA system is described in U.S. Pat. No. 7,932,923, issued to Lipton et al. on Apr. 26, 2011 (the '923 patent), the contents of which are incorporated herein by reference in their entirety.
For example, in a video surveillance system at a facility including an automated teller machine (ATM), objects such as people at the facility can be detected and tracked, and information about the people, such as an amount of time spent by an individual at a particular location, such as the ATM, at the facility can be collected.
Some existing systems use RGB (red green blue), CMYK (cyan magenta yellow key), YCbCr, or other sensors that sense images in a two-dimensional manner and perform analysis of those images to perform object and event detection. Other existing systems use depth sensors, to generate three-dimensional data or depth maps, which are then analyzed using different software in order to perform object and event detection. In some ways, the systems that use depth sensors are more accurate than the two-dimensional systems. For example, the depth sensor systems may obtain more accurate three-dimensional information, and may deal better with occlusions. However, depth data and images determined by depth sensor systems are generally lower in resolution than RGB data, and may therefore include fewer details than RGB images. In addition, depth sensors are a relatively new technology for video analysis, and are still prone to error in determining three-dimensional coordinates. Further, certain information resulting from depth sensors often remains incomplete, such as depth information for objects with specularities, or depth information for featureless surfaces extracted from stereo.
Certain systems may combine both depth and RGB data in order to perform analysis on complex three-dimensional scenes. For example, as described in U.S. Pat. No. 7,831,087, depth data and optional non-depth data are used to generate a plan-view image, which plan view image can then be analyzed by classifying objects in the plan view image. However, systems such as this, which perform complex analysis on depth data and optional additional data in order to perform object detection or event detection, still suffer from the problems above relating the drawbacks of depth sensor systems. For example, some of the depth data may be missing or may be inaccurate, resulting in an analysis of faulty data. In addition, performing analysis on three-dimensional data generally requires more complex algorithms and may require a complete re-design of hardware and/or software that performs the analysis, compared to more traditional two-dimensional image analysis systems.
The embodiments described here address some of these problems of existing systems, and provide a simplified way to use depth data to assist in image analysis and video content analysis. As a result, a less complex and more accurate system and method for detecting and tracking objects is achieved.