Digital cinema or digital media creation is the process of capturing moving pictures as digital images, rather than on film. Digital capture may occur on video tape, hard disks, flash memory, or other media which can record digital data. As digital technology has improved over the years, this practice has become increasingly common and in fact many television shows and feature films are now shot partially or fully in digital format.
With the prevalence of these videos, it is considered desirable and useful to be able to manipulate digital images after creation/production. There are various reasons for this. Realistic content modification can change a scene context, offer product placement, offer advertising and adapt content aesthetics based on user preferences.
Specifically with regard to digital product placement, there is a huge demand for replacement of products or insertion de novo of products into appropriate scenes. Since the inception of TiVo in 1997, digital video recorders (DVRs) have quickly become a staple in many households. One significant reason consumers prefer this technology is because it gives them the ability to skip commercials that appeared in a show's original broadcast. Complementing this trend, viewers can now watch many of their favorite television shows online or, in the alternative, download commercial-free episodes onto their computers or portable media players (e.g., iPods® or even cell phones) for a small charge.1 This mode of viewing shows no signs of slowing. 1See, for example Apple-iTunes, http:/lwww.apple.com/itunes/store/tvshowshtml (providing instructions on how to download TV shows onto iTunes, for viewing on a computer, or uploading onto a portable media device such as an iPod)
Such digital advances do not solely impact television viewers. Due to the increased use of this commercial-skipping technology, advertisers have had to find new ways beyond the traditional thirty-second commercial to get their messages out. Strategic product placement has been a welcome replacement. A market research firm found that the value of television product placement jumped 46.4% to $1.87 billion in 2004, and predicted (correctly) that the trend will likely continue due to the “growing use of [DVRs] and larger placement deals as marketers move from traditional advertising to alternative media.2 2See Johannes, TV Placements Overtake Film, supra note 15 (quoting a marketing association president as saying “product placement is the biggest thing to hit the advertising industry in years,” and noting that PQ Media predicts the value of product placement will grow at a compound rate of 14.9% to reach $6.94 billion by 2009).
Although product placement has been around in some form for years, the new focus on merchandising is via digital product placement or replacement. Digital product placement occurs when advertisers insert images of products into video files after they have already been created. For example, such technology has been used for years to superimpose a yellow first-down line into football broadcasts or to insert product logos behind home plate during televised baseball games.3 3See Wayne Friedman, Virtual Placement Gets Second Chance, ADVERTISING AGE, Feb. 14, 2005, at 67 (discussing efforts to incorporate digital product placement into television).
Within the digital video space, internet based video has continued to become a rapidly growing source of published content. The publishing sources include movies and TV programs, and are often streamed or downloaded over a network as digital data. Accordingly, on-line videos of the type available on services such as YouTube® have become a source of live music, comedy, education and sports programming. These videos are usually sent digitally to a viewer's computer, an “intelligent” TV, or a mobile computing device.
As online video viewing has become very prominent on the global Internet, the need to advertise in this medium has also gained popularity. Promotional content delivery methods offered with and around transmitted Internet videos is widely sought by numerous progressive advertisers—both to supplement and complement traditional advertising on television, radio and print media. Such advertisers are constantly seeking advertising that is targeted based on viewer's demographic, purchase behavior, attitudinal and associated data. Accordingly, some advertisers prefer to understand the context of online videos in order to improve advertising content relevance. Some examples of reasons to perform detailed scene-by-scene video content analysis include:                a. To subtly place products in the background of video scenes for viewers to notice, one would need to know the detailed scene content layout for appropriate product location placement. As an example, if a brand wished to advertise prior to a user requested video being shown the viewer (popularly known in the industry as Pre-Roll ads), or as a banner advert at the bottom of the video frame while the video is being played, it is important for the company to know if any competing products are part of existing video scenes to minimize conflicting messages to a viewer.        b. If a company is running Pre-Roll ads it may also wish to place a branded promotional item on a table in the appropriate scenes of videos to increase advertising impact. One may also prefer to place an item as part of the background content if the advertiser prefers a more passive product placement location. To avoid impacting the video scene contextually, the system must account for identifiable items that comprise the scene, and decide if it is appropriate for product placement.        
Currently, automated computer vision based video scene item identification is based on:                Identifying dominant distinctive features of reference items, and searching for such features in a frame-by-frame analysis of digital video. Commonly, this method in industry parlance is known as feature matching.        Placing artificial “glyph” markers or bar codes in videos for post-production video analysis and/or editing        Using computer learning algorithm programs that compare numerous images of various instances of a similar item. These programs analyze the source images to make a statistical inference about the recurring characteristics of the items. These recurring consistent characteristics are often used to identify similar items in a frame-by-frame analysis of digital video.        
These methods cannot reliably analyze post production video to find items that are, analytically, important but without discernible features that are common to all forms of such items. For example, one could find a can of Coke® using learning algorithms to process known pictures of Coke logos and thereafter to find Coke products in a video, but this these known methods cannot identify truly featureless generic items like tables, TV or computer screens. Feature based analytical approaches become unreliable for some items since generic items will appear in numerous shapes, sizes, orientations, extraneous features and colors. Additionally, a generic item or region may look contiguous when placed near the background wall color or floor coverings in a two dimensional frame. Even finding a single type of a table in digital video with distinctive feature based identification programs is difficult since the appearance of a table may be visually similar to the carpet or the walls in the scene. Such features are easily discernible to a human vision system which interprets all images we see in the context of our past experience. However, it is much more difficult for a computer to analyze a two dimensional frame of just colored-dots (i.e. pixels or pel data). Additionally, even if a generic item is momentarily discernible to have distinctive features it will quickly become an unreliable property since the visual appearance of a surface and its surroundings will change as the camera perspective or lighting conditions shift.
Generally, feature based analysis methods only work if the item has discernible and distinctive features (in terms of color, shape, or intensity gradients) that can be consistently identified by a computer program over numerous scenes in videos. Additionally, learning algorithms only work if an item is structurally similar across a wide range of circumstances—for example a human face would cause a learning algorithm to focus on the spatial consistency of the location of shadows cast by people's eyebrows, nose, and chin.
Therefore, it is currently difficult to detect items that do not have distinctive features and yet this is a critical requirement for many product placements. It is an object of the present invention to obviate or mitigate the above disadvantages such that non-distinctive digital images can be readily and accurately identified and thereafter, as desired, manipulated.