Object recognition is a technology in the field of computer vision for finding and identifying objects in an image or video sequence. Typically, an object recognition model is a machine learning model related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. However, existing object recognition models are prone to optical illusions, which result in false positives. They are further prone to false negatives by failing to identify an object that blends in with its background.
In order to reduce false positives and negatives, an existing object recognition model uses a machine learning algorithm that calculates depth map information from a stereo camera input to identify pedestrians in an image. The stereo model is trained with depth map information, so it learns to place less confidence on objects that are in the background, leading to less false positives from background objects. However, the above-mentioned approach is restricted to detecting objects of one class only, i.e. pedestrians. Another significant disadvantage of the above approach is that it requires input data from a stereo camera. This greatly limits the amount of training material for the algorithm, as the stereo data is not readily available for a wide range of objects.
Further, for live video data, running a real-time object recognition model is computationally expensive and usually requires powerful hardware. In an example, typical moving systems, such as cars or unmanned aerial vehicles (UAVs) must perform object-recognition in real time, and without network (cloud computing) resources. These platforms typically have limited processor capacity, particularly on UAVs, which are highly constrained by weight and power availability. In a further example, in a typical tactical video security system, real-time video information has to be made available to the end users on their mobile devices with a latency of less than one second. An isolated imaging device such as a drone system does not have a robust network connection, or a security camera that is not connected to a high-speed internet connection may be referred to as edge device. The major problem that edge devices have, as opposed to cloud video analysis systems, is lack of processing power to run complex models (neural networks).
In view of the above, there is a need for an object recognition system that is less computationally complex and that reduces generation of both false positives and false negatives to a great extent. The object recognition system should allow for smooth object-recognition output on less powerful hardware such as edge devices and small computers that lack Graphic processing units (GPUs), so as to save computational resources and electricity costs, and therefore achieve longer operating time, especially on battery operated portable devices.