The present invention primarily relates to summarization of video content including soccer
The amount of video content is expanding at an ever increasing rate, some of which includes sporting events. Simultaneously, the available time for viewers to consume or otherwise view all of the desirable video content is decreasing. With the increased amount of video content coupled with the decreasing time available to view the video content, it becomes increasingly problematic for viewers to view all of the potentially desirable content in its entirety. Accordingly, viewers are increasingly selective regarding the video content that they select to view. To accommodate viewer demands, techniques have been developed to provide a summarization of the video representative in some manner of the entire video. Video summarization likewise facilitates additional features including browsing, filtering, indexing, retrieval, etc. The typical purpose for creating a video summarization is to obtain a compact representation of the original video for subsequent viewing.
There are two major approaches to video summarization. The first approach for video summarization is key frame detection Key frame detection includes mechanisms that process low level characteristics of the video, such as its color distribution, to determine those particular isolated frames that are most representative of particular portions of the video. For example, a key frame summarization of a video may contain only a few isolated key frames which potentially highlight the most important events in the video. Thus some limited information about the video can be inferred from the selection of key frames. Key frame techniques are especially suitable for indexing video content but are not especially suitable for summarizing sporting content
The second approach for video summarization is directed at detecting events that are important for the particular video content. Such techniques normally include a definition and model of anticipated events of particular importance for a particular type of content. The video summarization may consist of many video segments, each of which is a continuous portion in the original video, allowing some detailed information from the video to be viewed by the user in a time effective manner. Such techniques are especially suitable for the efficient consumption of the content of a video by browsing only its summary. Such approaches facilitate what is sometimes referred to as “semantic summaries”
There are several proposed techniques for the summarization of a soccer game based on fully parsing the structure of the whole soccer game For example, Gong et al. propose to classify the frames in a soccer game into various play categories, such as a shot at left goal, top-left corer kick, play in right penalty area, in midfield, etc., based on a model with four components, the soccer field, the ball, the players, and the motion vectors. See, Y. Gong, L. T Sin, C. H. Chuan, H.-J. Zhang, M Sakauchi, “Automatic parsing of TV soccer programs,” IEEE conference on Multimedia systems and computing, pp 167-174, 1995.
Yow et al propose to construct a panoramic view of selected events by recognizing the soccer field, tracking of the ball, and recognizing the camera movement. See, D. Yow, B.-L Yeo, M. Yeung, B. Liu, “Analysis and presentation of soccer highlights from digital video,” Second Asian conference on computer vision, 1995
Choi et al, propose to construct panoramic view of certain events by recognizing the soccer field by color information, tracking players by template matching and Kalman filtering, and processing occlusion by color histogram back-projection The panoramic view is based on a model for the soccer field and all frames in the video are transformed into the views in the model. See, S. Choi, Y. Seo, H. Kim, K-S Hong, “Where are the ball and players? Soccer game analysis with color-based tracking and image mosaic”, International conference on image analysis and Pro, Florence, Italy, pp. 196-203, September, 1997.
Leonardi et al propose to use both audio and visual features of a multimedia document and a Hidden Markov Model (HMM) as a bottom-up semantic framework and a finite-state machine as a top-down semantic framework. Broadcast soccer is one example of their general method. The video features are “lack of motion”, “camera operations” and “presence of shot cuts.” The audio features are Mel-Cepstrum coefficients and zero crossing rates and are categorized into music, silence, speech, and music See, R. Leonardi, P. Migliorati, “Semantic indexing of multimedia documents,” IEEE Multimedia, pp.44-51, April-June, 2002
Xie et al., propose using a Hidden Markov Model (HMM) as the framework to segment a soccer video into “play/break.” The features used by Xie et al. are visual dominant-color ratio and motion intensity. See, L. Xie, S.-F. Chang, “Structure analysis of soccer video with hidden Markov models,” ICASSP 2002. Xu et al propose a similar framework without using the hidden Markov models. See, P. Xu, S.-F. Chang, “Algorithms and system for high-level structure analysis and event detection in soccer video,” ADVENT-Technical report #111, Columbia University, June 2001.
Xie, proposes using a framework that is entirely based on audio features. The approach segments the whole sound track into commentary and crowd noise and then picks up the excited/unusual parts in crowd noise. L. Xie, “Segmentation and event detection in soccer audio,” EE 6820 Project, May 2001.
Babaguchi et al. propose a technique to link live and replay scenes in American football broadcast video. The replay scenes are detected by (1) a video caption detection or (2) the sandwiching of digital video effects on either side of the replay segment. Babaguchi et al. note that it is impossible to design a general detector applicable to any DVEs, because there are a considerable number of DVE patterns that appear in everyday broadcasts. The DVE effect taught by Babaguchi et al. is a gradual shot change operation related to spatial aspects. The linking of live and replay scenes is performed based upon the dominant color of the key frame and the ratio of the number of vertical lines to that of horizontal lines on the field, which is representative of the camera angle. This technique for linking using the technique taught by Babaguchi et al., namely using the ratio, is suitable for the intended application, namely, American Football. However, this technique is not applicable to soccer.
What is desired, therefore, is a video summarization technique suitable for video content that includes soccer.