As communication and interactive technologies increasingly rely on more information-rich types of media to enhance their popularity and/or capabilities, there is an increasing need to process such information. Processing may be configured to, for example, capture, analyze, retrieve, and/or distribute the massive amount of information contained within the types of media used within these technologies to help users sift through the content and find information about the media that will be of most interest. However, due to the massive amount of media and information within media (e.g., a single day's worth of television programming may contain thousands and thousands of hours of content, addressing thousands and thousands of topics, narrative themes, etc.), attempting to capture, analyze, and/or distribute information may be extremely difficult. Therefore, the processing of certain types of information-rich media files is often performed using manual judgments and determinations. For example, a textual description of actors, characters or other entities appearing in an episode of “Friends” may be manually generated. That description can then be provided to users so they may be able to learn which actors, characters or other entities appear in the episode.
This and other approaches, however, have drawbacks. For example, the description may be lacking; a user may want to know which actors, characters or other products or entities are appearing in a particular scene or which actors, characters or other entities are speaking in a particular scene. Thus, there remains an ever-present need to provide more useful information and tools to users, for example, to provide for the capture, analysis and distribution of information related to media with greater functionality, accuracy and speed.