When a user chooses to upload media content via a network from their portable device, e.g., to a website or another user's device, the user oftentimes performs manual facial and object association operations and provides a selection of which users should receive or be allowed to view the media content. For example, the user may annotate, categorize or otherwise organize images and videos through an online media sharing server to share media with other users, which may be notified through the service that media is available for viewing if the user tags them. Oftentimes, however, users do not have the time or the energy to manually perform these operations. Through automation of facial and object recognition, the user's time spent categorizing, annotating, tagging, etc. may be minimized.
However, media, such as video and image, have been difficult to apply recognition techniques to on mobile devices. Some of the difficulties relate to the computational complexity of measuring the differences between the video objects. Faces and objects in these video objects are often affected by factors such as differences in brightness, positioning and expression. These difficulties are compounded by large corpuses of reference images and the large number of comparisons required for accurate recognition results.