Users commonly operate computing devices equipped with cameras to take images. Image applications hosted on such or other computing devices are available and provide many image editing operations. Cropping an image is among such operations and allows removing or extending a portion of the image to change its aspect ratio, improve framing, or accentuate particular subject matter.
When an image is cropped, its composition changes. The composition refers to the way various elements of the subject matter are arranged within the image, where the arrangement can reflect an aesthetic quality of the subject matter as it appears in the image. Composition guidance refers to the process of identifying how to perform better image crops that have good compositions.
For example, an image is captured and shows a tennis player serving a ball while on a tennis court. The image can be cropped to focus on the tennis player only. However, if the cropping is extensive and removes the player's racket or face, the resulting image would have an unacceptable composition.
Some existing systems allow automatic cropping, where the image application, instead of the user, generates or recommends image crops. However, the accuracy, efficiency, and speed associated with the processing to perform crop operations can be improved. More specifically, such systems generally adopt one of two approaches: rule-based and learning-based approaches.
Rule-based approaches encode rules in score functions to evaluate the composition of an image crop. These approaches rely on users manually designing the rules. However, designing rules for all types of images is very challenging, and the rule-based approaches often fail for images on which the rules do not apply. For example, many rule-based approaches rely on human face detection, but cannot work robustly for images of animals.
Learning-based approaches try to automatically learn composition rules or score functions from a training set. These methods rely on neural network and avoid the difficulty of manually designing the composition rules. However, these approaches also face multiple challenges. For example, the complexity of the neural networks can prohibit high quality crop suggestion or generation in real-time and/or the generation of a diverse set of image crops with good compositions. Specifically, existing learning-based approaches only target generating a single crop for a test image. In comparison, in a real world image application, the user may need multiple image crop suggestions of different aspect ratios and scales. Further, the quality of the image crops depend on the training. Existing learning-based approaches rely on supervised training that involves user-based annotations and that trains a neural network to learn and recognize specific features. Hence, the amount of user-based annotations and of the learned features can significantly limit high quality and diverse crop suggestion or generation.