In the prior art, when searching for pictures, it is generally, from a technical aspect, to process gray scale images of a target to extract local description of the images (SIFT, SURF), and calculate the distance or similarity between features of target images using a method such as Bag of Visual Words (BOW), Hamming Embedding, Locality Sensitive Hash Method. These methods ignore global features of the same importance such as color, shape, texture, target type, resulting the color, shape, texture, target type of the searched result are very different from that of a queried target.
From the point of view of searching, comparing targets in a monitored scene one by one without distinguishing different target types lacks pertinence and wastes resources. Specifically, images to be searched and images in an image database are not classified according to target types, it is necessary to compare a target contained in an image to be searched with a target contained in each image in an image database when searching pictures in the prior art. However, different types of targets in a monitored scene are actually quite different, whereas this difference is not utilized when searching pictures in the prior art. Thus, comparing targets one by one without distinguishing different target types in the prior art lacks pertinence, and the efficiency of searching target in an image using a one-by-one comparison method is not high.