Today, end users are generating enormous amounts of content, including content in the form of pictures and videos, and are relaying the content to their various media networks (e.g., content storage networks, cloud computing infrastructure, social networks, and the like) via various user devices (e.g., desktops, palmtops, e-readers, handhelds, and like devices). In most cases, these pictures and videos do not convey any information apart from the visual, and sometimes aural, details of the pictures and videos. It is often said that a picture is worth a thousand words; however, in most cases those thousand words are not known without an explanation by the end user who took the associated picture or video. While attempts have been made to augment such content with additional information, augmenting of content is currently a highly manual process with little or no improvement over the way that content is produced in the print media.