With continuous development of video playing technology, the demand for performing image process on video screens is also increasing. Currently, in many application scenarios, it is necessary to fit a main target object from the video screen, and then perform processes subsequently according to the fitted target object. For example, some self-media needs to produce a synopsis with pictures based on contents of the videos. Under such circumstance, it is necessary to fit main characters from the video screen, and then produce a synopsis of the video according to the fitted main characters and words to be added later. For example, when barrage information is displayed on a video playing screen, in order to prevent occlusion of barrage information on main objects in the video screen sometimes, the main objects have to be fitted from the video screen first, and then blocking of the fitted main objects may be avoid by processing the barrage.
The inventor notice that there are at least the following problems in the existing technology: at present, a target object in a video frame is usually fitted by means of a binary mask image. Specifically, a binary mask image in consistent with the video frame may be generated in which a region where the target object is located may have different pixel value from that of other regions. Then, processes may be performed on the binary mask image subsequently. However, the data amount of the binary mask image is generally large, thus the amount of data to be processed subsequently would be increased if the target object is fitted according to the binary mask image, resulting in a lower processing efficiency.