The extracts required to compile a sequence of “highlights” clips are typically created from within a longer video sequence using standard digital video editing tools that define the start and end points of each video clip either directly or by reference to a key event within the clip. Each clip is typically saved as an independent file which can subsequently be provided to viewers. Typically a viewer requesting a “highlights” sequence is provided with an address, or set of addresses, from which to retrieve the clips that make up the required sequence.
It is known, for example from International Patent Specification WO2004/025508 to provide an automated procedure to identify ‘highlights’ within a video sequence in order to provide the viewer with navigation aids or to simplify and reduce costs in video production and editing. The prior art typically identifies video clips automatically through analysis of changes in video or audio levels within a video sequence. More specifically, the boundaries of the ‘highlight’ video segment are typically identified using various super-histograms, frame signatures, cut detection methods, closed caption information, audio information, and so on, by analysing the visual, audio and transcript portions of the video signal. For example, based on the volume of audience cheers (U.S. Pat. No. 7,831,112) recorded in response to the performance of the athletes, or analysis of the area of the image depicting the score to detect changes (U.S. Pat. No. 7,983,442).
It is also known, for example from European Patent Application EP 1421780, to provide for a viewer to manually bookmark a ‘highlight’ for later access. Whilst the human input can be more reliable than analysis of the content, individual users will respond in different ways, and in particular will have different reaction times, ranging from an almost instant response to a voice or gesture command, or if the user has his finger poised over the appropriate control in expectation of an event, to several seconds if the user is less familiar with the system and/or needs to look for the control unit or, if it has multiple functions (such as a tablet computer) the relevant programme required to set up the bookmark. It is therefore difficult for a video retrieval processor to accurately determine which part of the content is appropriate to mark for a video clip in response to an individual bookmarking the content. Further variation can occur dependant on whether the event bookmarked is a “set piece”, which users are expecting, or something unexpected for which there will be greater variation in response times.
The present invention provides a video service platform for generating video clips from a sequence of video data elements for delivery and playback on demand, comprising:                a user input unit for receiving a plurality of individual time stamp data inputs, each generated by a respective user, identifying a part of the video data to be used to generate a clip,        an aggregation system for calculating an aggregated time stamp value derived from the plurality of the individual time stamp data,        an event marker unit for associating an event marker flag with an element of the video data sequence in accordance with the aggregated time stamp data,        and an output unit for generating a video clip from a plurality of video data elements defined by relation to the event marker flag.        
The invention also provides a method for generating video clips from a sequence of video data elements for delivery and playback on demand, wherein individual time stamp data inputs generated by each of a plurality of users identifying a part of the video data to be used to generate clips are aggregated to calculate an aggregate time stamp value, and associating an event marker flag with an element of the video data sequence in accordance with the aggregated time stamp.
The process for determining the aggregate time stamp value may be selected according to metadata associated with the individual time stamp values, and/or the distribution of time stamp data inputs. The distribution of time stamp data inputs may also be used to control the duration of part of a video clip before and/or after the event marker for that clip, for example by selecting a total duration time, or selecting the proportion of the clip that appears before the event marker.
The invention enables viewers to identify a number of key events within a video sequence (such as goals in a football match) using viewer-defined ‘temporal bookmarks’. These bookmarks are stored as time-codes along with associated metadata which identifies the time-code as belonging to a certain type of event. In the preferred embodiment, instead of marking the beginning and end of a highlight clip, a first value marks a key event and a second value is used to define a ratio to identify the relative duration of the clip to be provided before and after the marked point. Thus the user does not need to identify the beginning of the build-up to the event itself. The actual duration can be adjusted whilst preserving this ratio.
A clip can be identified and generated easily and in real-time on a live video stream by a non-expert user through a single interaction at the moment of the key event. However, users may vary in the time they take to respond to the key event, either through unfamiliarity with the control device, variations in the time taken to realise the significance of the event, external distractions, etc. This can reduce the effectiveness of the system as different users will receive different clips, showing more or less of the events of interest. In particular, the clips would have to be long enough to ensure that the key moment (goal, catch, putt, overtaking manoeuvre, according to the sport in question), is caught, resulting in any clip delivered being likely to include more of the events leading up to, and/or following, the key moment than is desirable. This in turn means that fewer clips can be shown to a user in a given time.
A further disadvantage is that a very large number of event markers are stored. As well as causing a storage problem for the service provider, it makes retrieval difficult, especially if the clips are made available to viewers who did not see or bookmark the original broadcast, or parts of it, and wish to use the bookmarks to create a highlights sequence. Many of the bookmarks will relate to the same event and, without detailed analysis, it would be difficult for a user to identify which of the many bookmarks relating to an event will provide the most accurate view. Even for the users who created the bookmarks, their reaction times may vary from one mark to another which will result in the start and end points of some clips being earlier or later than is desirable.
The present invention overcomes this by analysing the temporal distribution of event markers generated by a plurality of users during a video transmission (which may be streamed live or may be being watched as an “on-demand” time-shifted recording), identifying event markers in close temporal proximity to each other, generating an aggregated time stamp, and generating an aggregate event marker having the value of the aggregated time stamp. The aggregated value may be a simple median value, or some earlier or later point in the distribution of the time stamps, such as the 25th percentile. Metadata provided by the users in conjunction with the event markers may be used both to initially identify event markers relating to the same event, and for determining which of a plurality of aggregation processes is to be used for each such group—for example it would be expected that the user's bookmarking action for a “set piece” such as a penalty shot, etc would suffer less scatter than for an unscheduled event such as collisions, goals from free play, etc.
A common approach to delivering video over HTTP (hypertext transport protocol) involves the dissection of large video files into multiple smaller files (called chunks), with each chunk corresponding to a segment of video perhaps a few seconds in length, and each with its own URL (universal resource locator). In such systems the a server platform provides a manifest file to the client, which specifies the URL of each chunk of the clip requested, so that the client can request, compile and then play back the video. In this embodiment the way the manifest file is created and interpreted by the client device so as to prioritise delivery of content based on its narrative importance, by downloading the chunks relating to the key events (such as goals) first, followed by further ‘video chunks’ in the sequence preceding and following each individual event. Different events may also be given different priorities, so that for example events such as goals are delivered first, with other events such as saved goals, fouls, etc delivered later. Chunks relating to the lower priority events may start to be delivered before delivery of all the chunks of the higher priority ones are complete, by prioritising the defining “event marker” chunk of a lower priority event ahead of chunks occurring some distance before and after the event markers of the high priority events. In any event, after delivery, the chunks are reassembled into their correct chronological order for viewing.
This arrangement enables ‘highlight clips’ to be provided in a flexible manner suited to the bandwidth available. This would be beneficial to viewers using low-capacity networks or those with unreliable coverage, for example as may be experienced by a mobile user. This arrangement would allow the number of individual highlight clips made available to the viewer to be optimised in the presence of restrictions in bandwidth availability or connection reliability. The prioritisation of video chunks would also enable video clips to be dynamically compiled ‘on the fly’ matched to specific play-back length restrictions. This facility may be useful when providing ‘late-corner’ catch-up facilities which would enable the viewer to be provided with a synopsis of key events which took place prior to joining the live stream.