Traditionally, image/object detection is relied on hand-engineered features to detect salient points/textures/shapes etc., in an image to let a machine understand the scene and the context of the object being seen in an image captured by a camera. With the advent of large datasets, and compute capabilities, it has become tractable to use machine learning to train a “model” that automatically learns how to understand images by training it on large datasets of labelled/annotated images. This general approach has led to advancements in the state of the art that allow a machine to detect anomalies in x-rays, navigate a road in a self-driving car and many more novel applications.
Even though the field of computer vision has seen several advancements at the algorithmic level through the use of novel Convolutional Neural Network (CNN) architectures, most generic algorithms are dependent on expensive hardware (typically GPUs) to run large scale operations. This has led to several large technology companies such as Google, Microsoft and Amazon providing computer vision as a service through an Application Programming Interface (API) whereby an image is transmitted over the internet, it is processed in a datacenter with sophisticated GPU hardware, and the recognition result is returned over the internet.
In the field of surveillance using fixed cameras, the use of APIs exposed over the cloud is difficult due to a few reasons:
Given the large number of megapixels in modern cameras and a high framerate of at least 30 frames per second, a very large amount of data needs to be transmitted over the cloud. This results in large bandwidth requirements which may incur a high cost or may even be impossible given existing deployed surveillance camera infrastructure.
The general purpose hardware deployed in data centers consumes a large amount of electricity. For example, a single Titan V GPU by NVIDIA can draw approximately 250 watts of electricity.
Present high-performance computing hardware such as GPUs also produce a large amount of heat which can result in sub-optimal performance in hot climates with insufficient cooling methods.
The privacy and data protection laws in certain jurisdictions may prohibit the transmission of personal data to a third party such and Amazon, Google or Microsoft for processing.
There is a need for a low cost, less computation intensive solution for image/video analysis comprising of hardware and software that addresses security, bandwidth and time constraints while being cost effective.