With the advent of digital file processing, it is possible to digitally insert objects (also referred to herein as “embed”) into a video. Digitally inserting objects in a video may have many benefits, for example enhancing the visual effects of the video, or improving the realism of a video, or allowing more flexibility for the video after it is shot, meaning that fewer decisions need to be made regarding objects to include in a scene at the stage of filming the scene. Consequently, digital object insertion is becoming increasingly common and utilised by video makers for all manner of purposes.
Currently, digital object insertion typically requires a number of processing stages. Although described further below, these can broadly be broken down into:
1. the detection of cuts;
2. the fusion and grouping of similar shots;
3. the detection of insertion opportunities (referred to interchangeably throughout as insertion zones);
4. the contextual characterisation of insertion zones; and
5. the matching between insertion zones and objects for insertion.
Detection of Cuts
A programme may typically be a half hour or hour-long show, and programme material is decomposed into shots. Shots are a consecutive sequence of frames which do not comprise any edit points, i.e., they usually maintain a coherence which indicates that they were recorded by a single camera.
They are delineated by cuts, where the camera usually stops recording, or the material is edited to give this impression. Broadly speaking, there are two types of cuts: “hard” cuts and “soft” cuts. A hard cut is detected when the visual similarity between consecutive frames abruptly breaks down, indicating an edit point or a change in camera angle, for example. A soft cut corresponds to the beginning or end of a soft transition, for example a wipe or a fade transition, characterised by a significant but gradual change in the visual appearance of the video across several frames.
First, it may be necessary to analyse the source video material (for example, the programme material), and locate suitable scenes for object insertion. This is usually referred to as a pre-analysis pass, and is best done by dividing the source video into scenes, and particularly into scenes shot from the same camera position. Segmentation of video material into scenes may typically be performed automatically, using shot change detection. A video analysis module may automatically detect hard and soft cuts between different shots, which correspond to hard and soft transitions respectively.
Fusion and Grouping of Similar Shots
Once a shot or shots have been detected, continuity detection may also be applied in a further processing step to identify similar shots that have been detected in the source video. In this way, when an insertion opportunity is identified in one shot, a shot similarity algorithm can identify further shots in which the same opportunity is likely to be present.
Detection of Insertion Zones
Image regions in the source video content that are suitable for insertion of additional material are referred to as insertion zones, and these can broadly be categorised into surfaces and objects. In general, a surface may be suitable for the insertion of material. In the case of a wall, for example, a poster might be added. In the case of a table, an object such as a drink may be inserted. When an object is identified as an insertion zone, the opportunity for insertion material may relate to rebranding any brand insignia identified on the product, replacement of the object with another object belonging to the same class of objects, or the addition of a further similar object in close proximity with the object.
Detecting insertion zones can be pursued and refined through the tracking of coherently moving pixels throughout the source video material. Image-based tracking techniques include but are not limited to planar tracking algorithms to compute and model 2D transformations of each image in the source video.
Contextual Characterization of Insertion Zones
An operator may be required to assess the identified insertion zone and provide context for the possible additional material which may be inserted therein. With the rapid rise in the amount of digital video content which is being broadcast or streamed via the internet, the fact that a human operator is not able to process insertion opportunities to identify context much faster than in real time may be a problem.
Matching Between Insertion Zones and Product Categories
It is not enough to merely identify the insertion opportunities through pattern recognition processes, there may also need to be some intelligence applied when selecting the material which is to be inserted into the video content.
For an instance of object insertion not to detract from the viewing experience, it should make sense within the context of the source video content into which it is placed. If a scene takes place in a kitchen, for example, additional content to be placed in that scene should be relevant to the objects that the viewer would expect to see in that location. For example, one would perhaps not expect to see a perfume bottle located on a kitchen side board next to a kettle. Much more suitable in the context described might be a jar of coffee. Likewise a bathroom scene is suitable for the placement of bathroom or hygiene related items, rather than groceries. Consequently, an operator may be required to assess the scene to select a particular object or category of objects that would be suitable for insertion in any identified insertion zone. Again, the fact that a human operator is not able to process insertion opportunities to identify context much faster than in real time may be a problem.
It may be appreciated from the above that the identification of insertion zone opportunities and suitable objects for insertion may typically be a time consuming, multi-stage process that may limit the volume of video material that can be analysed.