Object recognition is a technology in the field of computer vision for finding and identifying objects in an image or video sequence. Typically, an object recognition model is a machine learning model related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Convolutional Neural Networks (CNNs) are the premier algorithm used in the object recognition.
Standard CNNs consist of a series of layers that perform mathematical computations on an image. The recognizing and classifying of objects into fine grained categories requires a deep CNN with many layers. Each layer requires millions of floating point operations, and also requires memory access by corresponding Central Processing Unit (CPU). A disadvantage with existing CNNs is that they fully process every camera frame. This results in wasted computation on frames with no relevant objects. As a result, the powerful, accurate object recognizers become slow, and require specialized hardware such as Graphic Processing Units (GPUs) for performing object recognition.
Further, for live video data, running a real-time object recognition model is computationally expensive and usually requires powerful hardware such as GPU. In an example, typical moving systems, such as cars or unmanned aerial vehicles (UAVs) must perform object-recognition in real time, and without network (cloud computing) resources. These platforms typically have limited processor capacity, particularly on UAVs, which are highly constrained by weight and power availability. In a further example, in a typical tactical video security system, real-time video information has to be made available to the end users on their mobile devices with a latency of less than one second. An isolated imaging device, such as a drone system that does not have a robust network connection, or a security camera that is not connected to a high-speed internet connection, may be referred to as an edge device. The major problem that edge devices have, as opposed to cloud video analysis systems, is a lack of processing power to run complex models, such as, for example, neural networks.
In view of the above, there is a need for an object recognition system that is less computationally complex and has increased speed and accuracy. The object recognition system should allow for smooth object-recognition output on less powerful hardware such as edge devices and small computers that lack Graphic processing units (GPUs), so as to save computational resources and electricity costs, and therefore achieve longer operating time, especially on battery operated portable devices.