1. Field
The present invention relates to the editing and sharing of time sequenced multi-media materials (audio/video in particular) from internet sources.
2. Description of the Related Art
Certain mechanisms for editing of video materials have been long established. The overall goal of video editing is to present the viewer with sequential excerpts from separate video sources as single composite video image. Therefore an essential skill of a video editor's craft is the selection of edit points from their source material. The audience experience of a switch from one audio/video source to another is by its nature visually disruptive. This disruption can be minimized by the careful selection of edit points. For example, a cut in a basketball game video would likely be least disruptive if it occurred between the time the ball passes through the net of a scoring basket and the time when the opposing team retrieves the ball for the next play. The window of time for this cut could be as little as 0.3 seconds. Another example could be an excerpt from a speech. A cut in such a speech would likely be least disruptive if it happened at the end of an applause moment in the speech but before the speaker resumed speaking. This instance may have a larger window for an acceptable edit—perhaps 2 seconds. But cutting at a cough or other short pause in the speech would require a much shorter window. Thus providing the editor with a tool that can work within a short window of editing opportunity can provide them with greater editing options. Conventional desktop non-linear editing tools provide frame accurate editing capability—with editing windows of less than 0.04 seconds. However there are no known tools for providing such accuracy for multiple streaming audio/visual sources.
In the early days of video editing, physical cutting of video source media was required. In later years, video switching technology was developed that employed synchronization between the outgoing and incoming media sources to allow for an electronic switch between the sources to occur at a virtual cut point. This allowed for the recording of a third video which included this transition. This process, which is repeated to create a single video with multiple cuts, is called “linear editing”. For example, U.S. Pat. No. 4,538,188, issued Aug. 27, 1985 is a video composition method and apparatus for dynamically composing sequences of visual source material or edited output. However the introduction of high speed digital computers has substantially replaced such “linear editing” techniques with “non-linear editing” methods. Non-linear editing allows for the presentation of randomly accessed individual frames of video from an arbitrary number of sources. For example, U.S. Pat. No. 6,489,969, issued Dec. 3, 2002 is a media composition system with media consolidation employing digital techniques to receive, digitize, store, and edit video and source material. One major benefit of non-linear editing is that the source material does not need to be physically copied to present the viewer with the resulting composite video. This gives the editor a great deal of interactivity and reduces storage requirements for test composite videos. Nearly all modern video editing utilizes non-linear editing tools, but such tools generally require frame accurate control of locally stored source material. Furthermore the ultimate objective of most non-linear editing systems is to create a single linear composite video (often called the “final edit master”) which is stored, copied, and distributed independent of the original source material and the list of edits that created it. This makes it difficult, if not impossible, for viewers to see the original context of the edits, or for other editors to create alternative composites.
The power and precise control of non-linear editing systems also bring added user interface complexity, most such systems provide an visual abstraction of the video as a sequence of image frames, which allow the user to pick a precise frame on which to make the edit. Some editing systems also abstract the audio as waveforms, and abstract various transitions as user editable graphs. Casual users of editing systems are often bewildered by these abstractions, so there is a constant need to simplify editing systems to reduce barriers to entry. Furthermore, user expectations of browser based applications are especially sensitive to user interface complexity.
Streaming of internet video has also been long established. The fundamental idea is that video data is downloaded from a server to a viewer's client computer to non-persistent storage. While downloading, the client computer can start playing the video asynchronously. This is possible because the client computer buffers a small amount of video ahead of the currently playing video. This buffer is typically large enough to accommodate fluctuations in download rate. Ideally the rate of download of the video should be larger than the rate of video consumption by the viewer. Otherwise the video playback will need to be stalled to accommodate sufficient buffering (a.k.a., buffering fault). Furthermore, to reduce download bandwidth requirements, and hence buffering faults, the video data is usually highly compressed. Compression techniques can take advantage of temporal (interframe) coherence of data. That is to say, significant compression can occur when sequential frames of video comprise identical or similar information (i.e., sequential frames don't differ by much). Many compression techniques take full advantage of this characteristic. An unfortunate side effect for video editing, is that individual video frames are no longer randomly accessible. In fact, the notion of individual video frames may not be meaningful, and is not even included in the browser standards specifications (e.g., http://www.w3.org/TR/html5/video.html#media-elements). Therefore frame accurate “non-linear” editing of such streaming internet video is challenging, if not impossible.
Alternatively, there are tools for downloading of compressed video, and transcoding the video into formats that are digestible by editing systems. However the use of such tools often violates terms of service agreements and/or copyrights of the source video provider. Furthermore, the resulting composite video may suffer from generation loss associated with multiple compressions, decompressions, and transcodings.
A related strategy for editing of internet based video, performs the edit composition on a server and transmits the resulting video stream to a client browser. See e.g., U.S. Patent Application Publication No. 2002/0116716, filed Feb. 22, 2001, and U.S. Patent Application Publication No. 2010/0260468, filed Apr. 6, 2010. However this strategy requires significant server side computational resources to download and process source videos, and then transmit the composite video. Additionally, since the final video is computed in real time, this strategy neutralizes the benefits of internet “edge caching” for static video assets. Edge caching is a load balancing and performance management technique that utilizes dedicated server resources on a network. These servers, based on their awareness of network protocols, essentially siphon off the network traffic of one application from the others and process that data specially to improve the performance end users of a target application can receive.
In contrast to the above related art, the objective of the present invention is to provide a system that can present internet users composite videos directly from original streaming internet source videos.
In addition to allowing the playing of video, most internet video sources, also provide programmable control of their video through a client browser Application Programming Interface (API). Typically, these are exposed through JavaScript bindings. One example is the YouTube Player API (http://code.google.com/apis/youtube/js_api_reference.html). Another example is the HTML5 MediaElement (http://www.w3.org/TR/html5/video.html#media-elements). Such APIs allow 3rd parties to embed videos within their own sites, and control operation and properties of the embedded video within their sites. While these APIs are most often exposed as bindings which extend the JavaScript language, they may also be exposed in another language such as Adobe ActionScript or Java. Although such APIs are not necessarily designed for video editing, they often have the minimal functionality required for this application. For the purposes of video editing these minimal requirements are the abilities to start and stop a video, the ability to query the start/stop state, the ability to query the current video time, the ability to seek the video to a specified time, and the ability to control the volume of the corresponding audio stream. In practice these functions all vary in both precision and accuracy. For example, the time required to start a video playing from the time an API command is issued could be 0.5 seconds or more depending on how much video is buffered, and what other processes are competing for the client computer's resources. Such API functions have been used in limited circumstances to control video editing of internet sources. One example is the Kaltura Video Sequencer (http://www.kaltura.org/html5-video-sequencer). Additionally, such API functions have been used to control the excerpting of videos. For example, see (http://www.splicd.com/). This site uses the YouTube API to allow the user to only show a defined excerpt of a single video.
It is common practice for internet applications to collect information with regard to user interaction. For example, various Google and Facebook applications take advantage of a transparent feedback loop to improve their user's experience. An additional object of this invention is to include such a feedback loop in the monitoring of the invention's edit transitions, to aid in fine tuning its capabilities.
Most internet video sources and some 3rd party internet sites also allow their users to define “playlists”. Such playlists are typically sequences of contiguous videos either from their own site or from a variety of internet sources. Players for playlists vary in their video selection and editing capabilities. Some only allow for simple sequencing of complete video clips (You Tube playlist player), while others allow for setting of in and out points (Kaltura Video Sequencer). However known video playlist players do not attempt to synchronize the edit points of source videos with split second accuracy.
Internet video providers and playlist sites typically allow for a variety of sharing options. These usually involve sharing an internet link of a video through email or a social media function, or they involve the embedding of the shared video in a social media, blog, or other web site. Such sharing is extremely popular and practical, since it doesn't require the copying of large video files—only the transfer of an internet link to the video files.
Standards compliant internet browsers also provide web site authors API control over the hiding and showing of various web site display elements. This facility is used in a variety of applications. For example, most web sites that incorporate a photo slide show component use this functionality. As will be seen below, this ability to hide and show display elements is important for this invention.