Imaging devices such as digital cameras are frequently used in a number of security or monitoring applications in facilities such as distribution centers, parking garages or mass transit stations. For example, arrays or networks of cameras may be posted near security gates or terminals of an airport, at entryways or near focal points of a sports arena, or within or above receiving stations, storage areas or distribution stations of a fulfillment center or other distribution facility. Because imaging devices have decreased in cost and increased in quality in recent times, large numbers of such devices may be deployed in such facilities, enabling the capture, analysis or storage of still or moving images, or other information or data, regarding interactions, actions or activities occurring at, near or within such facilities.
Presently, many detection systems provided in such facilities typically include a plurality of individual imaging devices, e.g., monocular view digital cameras, for the purpose of detecting, recognizing and classifying interactions, actions or activities occurring within their respective fields of view. The efficacy of such systems may be limited, however, by occlusions, obstructions or other cluttering within the fields of view of the respective imaging devices. For example, where a warehouse or similar facility includes an array of digital cameras mounted above or around a number of shelves, bays or racks that are frequented by any number of personnel or autonomous mobile robots, the internal infrastructure of the facility may prevent a complete view of each of the various interactions, actions or activities between such personnel or robots and such shelves, bays or racks from being captured by the various cameras, which may thereby result in high numbers of false positive detections, or low numbers of accurate detections, of the interactions, actions or activities.
Attempts to address the problems created by occlusions or cluttering within fields of view of such cameras have achieved varying degrees of success. For example, some detection systems have incorporated stereo cameras having two or more lenses and sensing components, with parallel or converging camera axes, thereby enabling such systems to capture interactions, actions or activities from multiple perspectives and make determinations as to ranging or other attributes of such interactions, actions or activities from such perspectives. Stereo camera systems require frequent calibration, however, and may be limited in the same manner with regard to occlusions or cluttering within the fields of view of the respective lenses and/or sensors. Similarly, some other detection systems have included range cameras, e.g., depth sensors which project infrared or other invisible light off of surfaces and detect the reflected infrared or invisible light from such surfaces, to obtain depth data regarding objects within their fields of view, and to utilize such depth data when classifying interactions, actions or activities occurring therein. However, using depth data to detect interactions, actions or activities is complicated and error-prone, as such determinations require one or more depth models to be generated based on the depth data, and the depth models must be then analyzed in order to recognize all or portions of the objects or humans within such fields of view and to recognize and classify any interactions between them.