Image processing applications allow for a user to create clipping masks in the shape of text that are usable to clip an image and create a photo text that incorporates portions of the image in the shape of the text. However, the process of creating clipping masks to create such photo texts is a complicated manual process. Further, placement of a clipping mask in a centered position on an image often results in a visually displeasing photo text, and the process of creating a clipping set that positions a clipping mask relative to an image is also a complicated manual process. A user that is familiar with both creating clipping masks and clipping sets may lack the artistic knowledge or experience to determine which portions of an image will be the most aesthetically pleasing. Additionally, a user cannot visually evaluate the aesthetic qualities of a particular photo text until it has already been created. Thus, determining an optimal size, shape, and location for a clipping mask presents a number of challenges.