Today, people often utilize computing devices or systems for a wide variety of purposes. For example, users can use their computing devices (or systems) to interact with one another, create content, share information, and access information. In some instances, a user of a computing device can utilize a camera or other image sensor of the computing device to capture or record media content, such as video content. Sometimes users have to manually provide information, such as by manually inputting descriptions and tags (e.g., identifier tags, location tags, hashtags, etc.), in order to describe the video content.
In some instances, media content can be analyzed by computing devices or systems in attempt to identify items, subjects, or other objects that are represented or included in the media content. In one example, images can be analyzed to detect one or more faces in each of the images. In another example, an image can be analyzed to identify any products within the image that are available for purchase via an online storefront. However, conventional approaches for recognizing objects within media content can often times be inefficient, inaccurate, and limited in capability. Due to these and other reasons, conventional approaches can create challenges for or reduce the overall user experience associated with media content interaction.