The field of mosaic movies resembles the well known field of mosaic images.
Mosaic images are photograph based mosaics and they are well known and popular in the field of graphic design and graphic art. Whereas traditionally mosaic images were manually composed by artists, such mosaic collages are now generally composed using digital search and matching techniques with specialised computer software. With such software it is possible to create mosaics containing hundreds and even thousands of images by automatically selecting images from thousands of raw source images in a digital image library. These library images are typically digital still pictures (photographs) or digital snapshots from motion video.
U.S. Pat. No. 6,137,498 to Silvers describes a computerized method to compare regions of a digital target image with digital images from a data base. Specifically the method describes how for each tile of a target image, the best matching image in a specified data base of source images is found.
Canadian patent application 2,442,603 by Saèd describes a computerized method to compare regions of a digital target image with digital images from a data base. Specifically the method describes how for each image in a specified data base of tile images the best matching tile region is found within the target image. In a further refinement step, the tile images are modified using digital image processing techniques such as adjustment of brightness, contrast, colour and cropping to improve the resemblance between the tile and the region, thus improving the overall resemblance of the mosaic with the underlying target image.
The above two works of prior art are focused on the computer generation of still image mosaics based on still images. In a particular embodiment still images may be obtained from video, in which case all or some of the still images are frames from motion pictures.
It is also possible to generate a mosaic that resembles a motion picture (a video clip) and that is composed of a tiled arrangement of other motion pictures (a mosaic of video clips). The result is a mosaic that resembles a target movie, whereby the mosaic is composed of a regular array of tiles of video clips.
Consider the following example to illustrate an application. A television commercial for a particular product of a company contains a video clip that is a motion mosaic. The commercial commences with the motion mosaic and accompanying music with a voice-over to attract the viewer's attention. The motion mosaic consists of a mosaic of 15 tiles vertically versus 20 tiles horizontally, and all tiles are square and of equal size. At a normal viewing distance from the screen, the mosaic is a video clip of the company's product for which the commercial is produced. Of course, based on the general characteristic of a mosaics, the mosaic effect is based on resemblance with an image or movie when viewed from a distance. The video clip that is a mosaic movie appears as a grainy movie about the product. The granularity is due to the underlying tiled arrangement of video clips. These video clips, in this example, would depict various uses of the product. The video clip in each tile is chosen such that it provides the necessary colour, contrast, brightness and motion based on the colour, contrast, brightness and motion in the tile area that it covers. When the commercial zooms on to the centre of the mosaic, it quickly becomes clear to the viewer that the perceived granularity of the clip is indeed a result of the mosaic effect. The viewer begins to recognize that the tiles of the mosaic are actually small square videos, depicting various uses of the product. At an ultimate zoom level, a single tile fills the entire screen and it is now obviously a video clip on its own, depicting a popular use of the product. Then the commercial zooms out, and as more surrounding video tiles become visible again, the viewer is reminded how the focused video clip is a video tile in a mosaic movie. When the screen is again filled with the 15 by 20 square video tiles, the grainy mosaic movie about the product dominates the viewer's visual impression.
This field of invention is not to be confused with a particular area of video graphics, denoted video mosaics, whereby a still image is generated by placing snapshots (video frames) from a particular video side by side on a composite image. This is particularly popular in sports events (e.g. track and field competitions) broadcast on television. As an athlete performs the critical part of their actions, for instance a jump over a high bar, that action is recorded on video. Subsequent to showing a video replay of the action, a still shot of the athlete is shown as they negotiate their body over the high bar. But then the athlete is shown not only in one frozen position over the bar, but at multiple frozen position, for instance one frozen shot as they approach the bar, one frozen shot as they bend their body over the bar, and so forth. These frozen shots may be displayed in smaller tiled frames on the screen, starting with the first shot displayed in the top left tile of the screen and ending with the last shot displayed in the bottom right tile.
The area of video graphics is thus different from the area of the present invention. In order to aid in the separation of fields in this description, the field of the present invention shall be denoted the field of motion mosaic rather than the field of video mosaics. However, prior art in both fields use the term video mosaic.
In publication “Video Mosaics” by Allison W. Klein e.a., published in NPAR 2002 (Second International Symposium on Non Photorealistic Rendering, pp. 21-28, June 2002), the authors present a method for creating a motion mosaic. An important and complex step in the generation of the motion mosaic consists of searching a data base of source tile videos to find the best match for representing a particular video tile region of the target image. The publication presents a method for determining the visual similarity between a particular source tile movie and a tile region movie based on a wavelet transform. That is, in order to select one source tile movie over another, the movies are not compared based on coloured pixels, but rather they are each first transformed into a different mathematical representation, and they are compared based on their features within that representation. The publication further presents a dynamic programming method that finds the best matching source tile video in a library of source tile videos for a particular tile region video of the target mosaic. The publication further presents a colour correction method that improves the similarity between the mosaic movie and the target video as a final step after the matching has been completed.
On web site “Video Mosaics” by Steve L. Martin, Charles Fowlkes and Alexander Berg at U.C. Berkeley, dated Fall 2003, the authors present a method for creating a motion mosaic based on thousands of video clips. Each source tile video is described by an average colour, a colour histogram, edge histograms and energy histograms, and these description are used to find the best matching source tile video in a library of source tile videos for a particular tile region video of the target mosaic. It is further suggested that the video tiles be able to move (to shift around) to maintain a good matching mosaic while the tile movies are playing.
Other approaches to movie mosaics are also known, whereby the mosaic movie is generated frame by frame, and each frame is a still image mosaic based on still images. In this particular approach, an individual tile in a target movie is not approximated by a tile movie, but rather by a sequence of still tile images.
A limitation of the methods by Klein and Martin lies in the required number of source tile movies. To obtain an optically and artistically pleasing end result, a data base containing thousands or ten thousands of individual tile movies (video clips) is required. A large data base is desirable since it would ensure to some degree that video clips covering a broad visual range of colour and brightness transitions are provided. A broad visual range would entail, for instance, video clips transitioning from very dark to very light, and video clips ranging transitioning from having a dark area against a light background to having a light area against a dark background. Video data bases containing large number of video clips are available commercially, and the larger the data base, the better the final outcome. It is beneficial to invent a method for composing pleasing movie mosaics from smaller data bases. Smaller sized private data bases consist for instance of private video clips, such as segments of home video, By using the prior art, such a data base may yield a less than pleasing movie mosaic due to the limitations imposed by the size of the data base and the resulting limited visual range of the video clips. For instance, the smaller data base may not contain sufficient variety of dark clips, or dark clips transitioning to light. As a result, a movie that is considered the best match for a particular region in the mosaic in comparison to all other clips in the data base, may actually turn out not to produce a visually pleasing match. It is then merely the best option, but still not good enough.
A second limitation lies in the matching method. The described method is tailored for large data bases. For each tile of the target movie, the described matching method finds the best tile movie in the data base. Hence, the method cannot guarantee the insertion of specific or all video clips of the data base. Not only is this a result of the matching method itself (the methods of the prior art find the best clip or the best segment of the best clip), it is also a side effect of the underlying desire to use large data bases. Clearly, if a data base contains thousands of video clips, as desired, a mosaic composed of hundreds of tiles can impossibly contain all the video clips in that data base. It is hence beneficial to invent a matching method that enables the placement of select or all video clips (in part or in whole) of a data base, resulting in a mosaic that better represents the video clips in a data base. For instance, if a mosaic is to be composed using a data base of home video clips taken at a private event (for instance a wedding), and if all participants at the event (for instance friends and family) were recorded in one or more video clips in the data base, it is beneficial if all participants are represented in the mosaic movie. In the case of a wedding, a video clip of the wedding couple cutting the wedding cake could be used as the target movie, and the video clips for the tile movies are recorded during the festive and formal activities throughout the day.
The present invention distinguishes the prior art in that it uses a finite sized library of source videos and ensures that each source video is included in the mosaic representation of the master or target movie. The significance of this may be best described by way of the following example.
At a wedding one or more videographers, professional or amateur, takes a number of videos of the bride and groom and all of the guests in attendance. These videos are then digitized, if necessary, and stored as source videos in a video clip library. One of the videos of the bride and groom might be selected as the target or master videos and a movie mosaic representation thereof is prepared in which each and every one of the source videos is incorporated. As an alternative the mosaic representation might be composed of videos of just the bride's family or just the groom's family. The important distinction over the prior art is that in the present invention a source video is selected and a place in the mosaic is found for it. In contrast, the prior art selects a region in the mosaic and finds a source videos from a very large library to best match the region. There is no attempt to place in the mosaic all source videos in the library.
A third limitation lies in the shortcomings of the cropping method. In prior art, the set of tile movies is generated from raw video clips by cropping each raw video clip to a square or rectangle of specified size. In prior art the purpose of cropping is to produce video clips of a desired shape regardless the shape of the underlying raw source video. For instance, this allows an entire data base consisting of video clips with a variety of aspect ratios (for instance one or multiple of the following: square, 3:4 and 9:16 rectangles) to be used for tile movies of any aspect ratio. For simplicity and to enable automation, each crop is performed from centre to ensure that the centered subject matter of a tile (typically an object, a person etc.) appears in the crop and is not cut out.
When the data base is of limited size, it is beneficial to produce multiple different crops based on a given aspect ratio, and to let a matching method determine which crop is best suitable to be applied to a source video clip for a given region in the final mosaic. Moreover, it is beneficial to adaptively adjust the crop (typically by adjusting size and location, but maintaining shape) throughout the duration of the video clip based on optimum resemblance with a given tile region of the target movie.