Today, media formats used by consumers are primarily digital, whether video, still images, or music. The decreasing cost of computing resources has resulted in emerging markets of casual media production. Casual user-generated video production is of particular relevance to this invention.
The main attraction of user-generated video productions is that they feature the user's own content. People are naturally interested in watching videos that show people they know personally, or shot in places and at events to which they have been.
Even with the rise in casual video production, however, there is still strong demand for professionally-edited video, such as that shown on television.
Music videos are one popular form of professionally-edited video content. A music video is “a filmed or videotaped rendition of a recorded song, often portraying musicians performing the song or including visual images interpreting the lyrics”, according to the American Heritage Dictionary (online edition).
Music videos are highly entertaining due to the catchy music, popularity of the artists, and great production quality. One key aspect of the production process is that visual elements are synchronized to the music—transitions, effects, and of course the footage itself (lip movements synchronized to the singing, dancing timed to the music's beat).
Some casual video producers emulate some of the techniques used in professionally-edited music videos. To provide continuity and to set the mood for their productions, many casual video producers use popular music recordings as audio background, and attempt to synchronize transitions & effects in the video with strong beats in the music.
Recent years have also seen the rise of the “video mashup”. A “video mashup” is a video production combining parts of various often unrelated videos to make entirely new productions.
Video mashups often use a pre-existing music video as a foundation, and “intercut” additional video material into it. Creating video mashups using conventional video editing tools requires considerable talent and effort.
A number of inventions have tried to address the problem of casual video production using computer based automatic methods.
Some inventions in the prior art focus on using a “template” to determine the structure and composition of the output production. The patent WO0039997 (Dekel Elan, Earthnoise Inc.) describes a method for automatically or semi-automatically creating “video movies” from “templates” that describe a temporal hierarchy for creating the movie. Slots in the template have associated keywords and material to fill in the slots is obtained by looking up the keywords in a video database.
The patent application US2005084232A1 (Tilman et al, Magix A G) describes a method and a user interface that presents “themed templates” with annotated slots that guide the user as to what kind of material would suit the slot's purpose. For example, a birthday template might have slots for party preparation, visitors, a shot of the cake and candles, blowing out the candles, and party wrap up. The user shoots video or pictures to fill these slots and the invention combines the user's visual data with graphics, sound effects and such elements specified in the template and creates the output production. Systems that simplify editing using templates typically have the characteristics of the inventions discussed above.
The patent U.S. Pat. No. 6,243,087 (Mark Davis et al, Interval Research Corp.) is about creating derived productions from existing media by means of a “functional dependency network” (FDN) that describes relationships between portions of input and output media. The input media is “parsed” to generate a content representation. An FDN is established that incorporates input media, content representations and other functions. The FDN is then executed to create the output production. In this case, the FDN is considered to be equivalent to a template, but the sense in which “template” is used in the two inventions above is more static than an FDN.
The patent GB2380599 (Kellock Peter Rowan, muvee Technologies Pte. Ltd.) is about automatically or semi-automatically creating an output media production from input media including video, pictures and music. The input media is annotated by, or analyzed to derive, a set of media descriptors which describe the input media and which are derived from the input media. The style of editing is controlled using style data which is typically specified by the user. The style data and the descriptors are then used to generate a set of operations on the input data, which when carried out result in the output production. This step incorporates techniques that can be taken as capturing a human music video editor's sensibilities—resulting in a production where the editing, effects and transitions are timed to an input music track. Since no significant constraints are placed on the input media and most of the tedious operations are automated by computer means, it presents a least effort path for the average camcorder/camera user to create an enjoyable stylish production. The commercial product by muvee Technologies named muvee autoProducer™ is based on the above invention.