Nowadays, vehicles are equipped with object detectors to detect objects, for example, pedestrians on the road and provide warnings/brake-assistance to the driver. An object detector employs a video camera to capture videos/images in front of and around the vehicle, and applies image processing techniques therein for identifying obstacles and pedestrians in front of and around the vehicle.
Existing object detectors employ histogram of oriented gradients (HOG) based object detection techniques, in which an image is divided into a plurality of blocks, and an amount of magnitude of a pixel gradient in any orientation in each block is counted to form a HOG of the image. Machine learning methods apply identification and classification techniques to the HOG so as to effectively identify the pedestrians and predefined objects in the image. The HOG of the image is computed based on gradient orientation and gradient magnitude of each image pixel. For an image pixel, corresponding gradient orientation θ is expressed asθ=tan−1(y/x)
where y=gradient in vertical direction, and                x=gradient in horizontal direction.        
The computation of the gradient orientation θ in a fixed point processor is traditionally implemented by using two look up-tables, one for division and another for tan inverse (tan−1). Use of the two look-up tables for computation of the gradient orientation of each image pixel is very time consuming, requires very high computation as well as high memory bandwidth, which makes it very difficult to implement with real time performance on an embedded system. Memory bandwidth is an important resource in any system, especially in a multi-core system. The memory bandwidth plays a crucial role in optimizing system performance. So it is very crucial to have the memory bandwidth as small as possible so that other processers also work in parallel effectively.
Other object detection techniques include computing an integral image for reducing number of computations. However, computation of the integral image reduces the number of computations, but increases an intermediate data element size, which again requires high memory bandwidth.