The present invention relates to data processing by digital computer, and more particularly to visual similarity.
Media content is being created and archived at a rapid pace. Media content can generally refer to a time synchronized ensemble of audio content and/or visual (text, images, graphics, video, and so forth) content that is captured from a presentation, lecture, speech, debate, television broadcast, board meeting, video, and so forth.
It is difficult to automatically identify a digital video clip in a video (i.e., digital video file) using image representations within the file alone. Even if videos could be compared to one another using pixel representations of the file, the file's aboutness would not be known because there is an absence of granular meta data inside the file. Here “aboutness” generally refers to one among other terms used to express certain attributes of the file, its content, subject or topic, and so forth. Aboutness (or synonymous terms) is important for knowledge organization and information retrieval.