The present invention relates to summarization of video content including baseball.
The amount of video content is expanding at an ever increasing rate, some of which includes sporting events. Simultaneously, the available time for viewers to consume or otherwise view all of the desirable video content is decreasing. With the increased amount of video content coupled with the decreasing time available to view the video content, it becomes increasingly problematic for viewers to view all of the potentially desirable content in its entirety. Accordingly, viewers are increasingly selective regarding the video content that they select to view. To accommodate viewer demands, techniques have been developed to provide a summarization of the video representative in some manner of the entire video. Video summarization likewise facilitates additional features including browsing, filtering, indexing, retrieval, etc. The typical purpose for creating a video summarization is to obtain a compact representation of the original video for subsequent viewing.
There are two major approaches to video summarization. The first approach for video summarization is key frame detection. Key frame detection includes mechanisms that process low level characteristics of the video, such as its color distribution, to determine those particular isolated frames that are most representative of particular portions of the video. For example, a key frame summarization of a video may contain only a few isolated key frames which potentially highlight the most important events in the video. Thus some limited information about the video can be inferred from the selection of key frames. Key frame techniques are especially suitable for indexing video content but are not especially suitable for summarizing sporting content.
The second approach for video summarization is directed at detecting events that are important for the particular video content. Such techniques normally include a definition and model of anticipated events of particular importance for a particular type of content. The video summarization may consist of many video segments, each of which is a continuous portion in the original video, allowing some detailed information from the video to be viewed by the user in a time effective manner. Such techniques are especially suitable for the efficient consumption of the content of a video by browsing only its summary. Such approaches facilitate what is sometimes referred to as “semantic summaries”.
Kawashima et al, in a paper entitled “Indexing of Baseball Telecast for Content-based Video Retrieval” disclose a technique for indexing a baseball telecast for content-based video retrieval. The system initially detects domain specific scenes in a baseball video based-on image similarity. Each of these scenes, referred to as a basic scene, are the shots which include a single pitching in each. After extracting these scenes, the system spots the exact location of pitching and batting action using continuous dynamic programming matching for fixed areas in the image. If the batter swings the bat, the system determines the end point of the play from the camera view after batting to recognize the batting result. The system also recognizes the caption to verify and confirm the recognition result. The stored summarization version of the telecast with the indexes form a video database. Kawashima et al. incorporate the rules of baseball in order to attempt to extract events from the video, such as the batter's pose using continuous dynamic programming for spotting pitching/batting scene in a basic scene, in the which the system searches the minimal warp function comparing the input video sequence with patching/batting model sequences. The system also attempts to detect and interpret text on the scoreboard, etc. After processing, the resulting video has the same length while being indexed to permit the user to select those portions which are desirable for subsequent viewing. This technique is computationally expensive, varies between different baseball games and especially between different broadcast companies, and is generally prone to error. In particular, the model sequences are generally unable to characterize variations within pitching scenes even if within the same game. Thus, after matching the video sequence with a given model, the matching scores may vary to such a large extent that if one wants to detect all potential pitching scenes, one has to include many false positives. Further, a fixed pitching scene model fails to account for variations across different games and/or different channels. Thus it is difficult to set an optimal threshold for classifying pitching scenes from non-pitching scenes. Thus with a fixed threshold, the system omits many pitching scenes and simultaneously includes many false positive pitching scenes. Also, the system fails to detect other types of activity in baseball that are of interest, such as stealing a base.
What is desired, therefore, is a video summarization technique suitable for video content that includes baseball.