Digital video recorders (DVR) allow a convenient and flexible way of storing and retrieving video and audio information accessible to the modern user of media content. Today, the majority of video content is coming from cable or satellite providers, or archived on different media. However, the rapid development of broadband networks has increased the percentage of content coming from the Internet, peer-to-peer sharing, etc. These trends blur the traditional concept of channels and we therefore refer to the possible sources of video as the “ocean of content”.
Storing and retrieving important content from this ocean is becoming a problem. Given the large choice of content, the user has problems choosing what he wants to see. Assuming that a typical DVR may store hours of video, a typical, modern user, who has limited time, is unable to see even a small fraction of the data he would like to see. Modem DVRs have some basic capabilities facilitating the preview and retrieval of recorder content, but they are too limited and generic to be convenient.
Viewers of video typically desire the ability to see certain portions of a program that are significant to them (i.e., desired content). It should be understood that for a single content, multiple different video digests may be created, depending upon the definition of desired and undesired content. Since such definitions are subjective, ideally, a video digest is custom tailored for every user.
Theoretically, desired and undesired content may be given a semantic description. For example, one may wish to adjust a DVR processor to analyze the video and automatically store or play back scenes of car crashes and automatically fail to store or play back (exclude) advertisements. Here “Car crashes” and “advertisements” are high-level, human recognizable, semantic labels to the aforementioned classes of video content. Automatically matching high-level and human understandable semantic description to lower level video content that may be automatically handled by one or more processors belongs to the general category of pattern recognition problems, usually referred to as video search.
Unfortunately, prior art methods did not teach unique and reliable ways to relate high-level semantic description to the actual video content in a way that would be particularly useful for typical video viewers. Video viewers usually prefer a zero-effort experience, in which no explicit interaction with the system is needed. By contrast, using prior art methods, a substantial amount of user effort was required in order to link a high-level user semantic description of wanted or unwanted video to particular to video patterns that might be handled and recognized by automated video recognition equipment.