At present, an extraction of image information is mainly concentrated on three granularity levels, which includes a coarse-grained classification and annotation of image for taking an entire image as a unit, and an object detection way for obtaining a physical object (such as a pedestrian, a human face, a car and so on) in an image, which requires specialized detectors trained for detecting, and also includes a fine-grained image segmentation and analysis on pixel scale, which may give a granularity description of image subject details on pixel scale. However, pluralities of problems are contained in the above image processing ways based on three granularity levels. The problems are that, an inaccurate and incomplete analysis of an image with a plurality of subjects may be given by an image classification and annotation such that analysis results obtained are instable, an image with a plurality of subjects of different classifications requires traversing more than once such that the object detection way causes a large calculation, and an image segmentation and analysis takes a long time to analyze, which is applicable to particular scenarios such as segmentation of human faces, clothes, skin and luggage.
With development of internet technology, the above image processing ways based on single granularity cannot satisfy demands of more diversity of web pictures and social pictures. In order to adapt to development of times, an image processing procedure in the related art includes: detecting a subject area where a subject is located in an image, and performing a subsequent analysis of the subject based on the subject area, for example, classifying or recognizing the subject. In this whole procedure, it is key to obtain a precise analysis result of the image for the obtaining of the subject areas of the image. There are two common ways for detecting the subject areas of the image. The first way is a way for detecting the subjects based on a significant area, which performs a processing of the image via a significant energy function to generate an energy distribution curve of an image, and obtains a more significant area according to the energy distribution curve such that the area is taken as the subject area of the image. The second way is a subject coordinate regression based on depth studying, which obtains the image to be processed, and predicts four point coordinates of an external rectangle based on subject via a trained deep neural network to detect the location of the subject areas.
However, problems of the above ways are that, it is not general that the significant energy function is used in the first way and the detection result of the subject areas in the first way lacks accuracy and validity that the subject areas; and in the second way, not only is it complicated for calculation and is it poor for timeliness effect, but also just one subject area is returned, which makes it difficult to process the image with a plurality of subjects.