Human detection techniques in an image or video typically employ a traditional method of background modeling. Human detection has significance in various fields, particularly in security and surveillance applications, where automatic human body detection is a key enabler, e.g., for applications in robotics, surveillance, and in an Intelligent Transport System (autonomous vehicle and automatic driver-assistance system).
For any object detection in an image, the image signals are processed for training set based classification, feature extraction for feature vector and principal component analysis, pattern recognition and wavelet transformation for employing convolution. For human detection, the extracted feature classifiers associated with the image are processed particularly employing Histogram of Oriented Gradient (HOG) for human detection and Haar wavelet transformation technique for face detection.
In the human detection techniques disclosed hitherto, the contour features obtained from the objects by extracting edges are employed as important features. In general, in the human detection techniques disclosed in the prior art, human detection and image processing takes place through supervised learning using training data set of small edge regions and hierarchical supervised learning thereof.
However, the probability of the detection of a human subjected to various unfavorable conditions that includes distorted image signals, background mix-up with object and human posture unavailability in the training data set, is significantly poor. However, a Haar-like wavelet transformation offers a formidable face detection technique in differential convolution analysis, however, it suffers from a higher threshold value associated with extracted features. Another impediment to precise human and face detection is the wide variability of data objects in the images.
A quality of an object detection system depends on the feature extraction; amongst others a Haar-like feature provides enhanced features for object detection. In field of car detection, a combination of Haar-like feature and HOG is a way to encode an input image to obtain a vector of visual descriptors. Haar-like features and concept of Region of Interest (ROI) are observed to significantly to increase probability of object detection. However, time and space considerations for detection of certain advance descriptors and use thereof along with HOG and Haar is not substantially disclosed in the prior art.
There are a plurality of techniques and algorithms taught in the prior art for detection of human using Support Vector Machine (SVM) based HOG features and human face detection using SVM based Haar features and background modeling (BG). However, these algorithms individually work and produce good results on a limited data set and mostly for color images.
Some of the lacunae that exists in the prior art are that, a single Background Modeling (BG) algorithm does not work for all the types of backgrounds and also with changing backgrounds. Moreover, if the color of the dress of the person is similar to the background then the probability of detection reduces considerably. In case of infrared (IR) images, color information is not present and hence the BG works on gray level images (single channel information), hence the performance of BG is not to the appreciable level.
In training set based systems, the SVM based HOG features classifier requires training with 1000s of positive images and 10000s of negative images to achieve a good result. This can never have 100% precision as it is not possible to know all possible human postures and it will have some false positives. The major problem lies in the training data set where the postures of the people can be of various types and when tested with new images with a new environment it will have always have some errors in detecting the people.
The SVM based Haar feature classifier for face detection works well for color images and whenever the face area in the image is substantially large. In case of IR images, gray scale images and when people are sitting far from the sensor (the faces covering 50×50 pixels or less) then that leads to lot of errors in the detection of the face.