Cameras are becoming increasingly accessible and commonplace, allowing users to capture media content of aspects of everyday life. After capturing media content, such as videos, digital images, and so forth, users have numerous options for displaying this content. However, many of these options require a significant amount of a user's time in order to create aesthetically pleasing presentations of the user's media content for display. For example, it may take several iterations to select a digital image that fits within an atypically-shaped frame without obfuscating some feature in the image that the user feels is important.
In another example, with the current techniques, a user is subject to several steps to manually create a media collage containing video. First, the user must select from among items of media content that is accessible to the user, and must select from available collage templates to display the media collage. Both the number of items of media content and the number of collage templates can be in the dozens, hundreds, or even thousands of options. Once the user has selected an item of media content and a collage template, the user then fits the selected item of media content into a cell of the collage template. The user then previews the media collage at this stage to determine whether the video content clips outside of its cell, whether the video has been placed in the desired cell of the collage template, whether the video contains black (or unwanted, uniform color) frames, whether a pause frame of the video contains a desired image, and whether the video has a sufficient amount of interesting content, to name some examples. For each item of media content added to the media collage, the fitting to the cell and previewing must be repeated, which is both time-consuming and frustrating for users.