The exemplary embodiments relate to apparatus and techniques for enrichment of Audio/Video (AV) recordings. Online AV recordings provide a useful format for viewing and/or listening to content originally presented at seminars, meetings, or other speaking venues. These may be used by attendees of the original presentation, for example to refresh their memory about the meeting content, or the recordings provide new content for interested parties who could not attend the original event. However, a viewer listener may have difficulty navigating such recordings to find interesting content, since the recordings are inherently unstructured. Consequently, a user typically has no option other than to listen to or view the entire recording from the beginning to the end. As a result, it can be difficult to find interesting content within an AV recording if it is not played in its entirety. Thus, while AV sharing mediums like YouTube, podcasts and Internet webcasts have become available in ever-increasing numbers and topical varieties, the unstructured character of such AV recordings has made the raw recording generally undesirable, absent post-presentation processing to provide metadata that indicates the nature of the content. Such post-processing typically involves an editor annotating the recording by identifying different portions corresponding to separate topics or subtopics, and providing corresponding summary content. Professional studios offer such services, but the editing process is labor-intensive and expensive. Moreover, such editing is typically time-consuming and error-prone, and subject to biases of the editor that may lead to inaccuracies in the supplemental metadata. Such inaccuracies are more prevalent where the editor is not proficient in the topic of the seminar or meeting. In this regard, semantic segmenting of a recording and identification of key messages is different from the fairly common technique of detecting scene changes within video recordings. Thus, while scene change identification may be a fairly straightforward and simple editing technique, a scene change does not necessarily imply a change of topics, and instead could be a switch to a different camera angle on the speaker. Moreover, a topic change does not necessarily imply a scene change, wherein the audio and/or video can be recording the same view of a single speaker when the speaker changes to a new subject and/or a different speaker may continue discussing the same topic a previous speaker was addressing. Thus, there remains a need for improved techniques and systems to help users navigate more effectively through AV content as the availability of online webcasts and seminars continues to increase.