Watching broadcast sport events has become increasingly popular, as reflected by the increasing number of sport channels. However, the vast amount of available content makes it impossible for a user to watch all of it.
One existing solution is to provide a user with a summary of the event which shows the main highlights. Existing summarization systems typically aim at choosing the best segments of a video sequence that fit a pre-defined time interval. For example, if the user asks for a summary of 5 minutes, the system then detects which are the best segments that fit that summary of 5 minutes.
A very popularly watched sport is tennis and even though there are usually no more than three or four tournaments broadcasted at the same time, the amount of matches (especially during the initial rounds of the competitions) is high enough to prevent users from watching all of the matches. Moreover, the structure of tennis, which corresponds to an alternating sequence of rallies and breaks are quite often filled with commercials. As a result, it is desirable for the user to be able to watch the highlights as opposed to the complete match, in particular, to watch those rallies that are interesting, spectacular or important for the end result.
US 2007/0292112 discloses a method of searching a highlight in a film of a tennis game. A plurality of long-field view shots are detected in the film and the audio energy of the long-field view shots is used to determine desired long-field view shots belonging to the highlights. For example, the audio energy is used to identify applause during the long-field view shots to determine the highlights.
However, from the method of US 2007/0292112, it is not possible to determine the most important (for example, the most interesting) highlights. Further, the audio energy used to identify applause is not particularly accurate as it is likely to include unwanted noise such as the commentator's voice-over or sounds made by the players such as screams, ball hits, etc.