1. Field
Embodiments of the present invention relate to automatic recognition of scenes in streams of video, and more particularly, to the use of non-fixed elements of a scene as reference points in automated video scene recognition.
This invention can be used for applications where scenes need to be recognized in video. Examples may include recognition of scenes in order to add advertising logos, sporting event statistics, or other types of virtual insertions to broadcast video, for example. This may apply to platforms streaming video media, including but not limited to television (e.g., broadcast, cable, satellite, fiber), the Internet, and mobile devices (e.g., cellular telephones or other wireless devices).
2. Background
In the fields of, for instance, broadcast television and film making, automated scene recognition has become a highly developed art. In broadcast television, automated scene recognition has been used, for instance, to insert virtual advertising and graphic effects into live and recorded broadcasts as described in more detail in, for instance, U.S. Pat. No. 5,264,933 entitled “Television displays having selected inserted indicia” issued to Rosser, et al. on Nov. 23, 1993, the contents of which are hereby incorporated by reference in their entirety. In film making, similar techniques are used to automatically incorporate special effects into films. In some cases, this process has come to be termed “matchmoving”.
The automated recognition techniques used typically rely on there being enough fixed recognizable structures in the scene such as, but not limited to, markings on a field of play, pillars, gates, seats, tables, stairs and billboards. Automated scene recognition techniques, therefore, typically have difficulty in operating in video footage in which there are few or no fixed landmarks such as, but not limited to, scenes in which the background is a crowd of people.
Current methods for scene recognition in video include searching for reference points within video scenes. Reference points may include groups of pixels or templates, lines, areas of texture; gradients, and other types of reference information or histograms. The term ‘structure’ may cover all of the above examples and others known in the field that specifies reference points. One example of a reference points is a “keypoint”. Methods include template matching, line searching, keypoint searching, and other methods. Position information for reference points may be used to calculate models for scenes and the models in turn may be used to add virtual insertions to the video. Methods to determine reference points may include use of calibration sequences or shots containing the reference points. Many scene recognition methods are known in the art and the present invention is not meant to be constrained to a particular recognition method.
Current known methods in the field require fixed reference points as these points may be needed to calculate the homography from world to scene. These homographies should be calculated precisely and there is no room for biased results. If a reference point moves even slightly from its fixed location, this point is tagged as an outlier and removed from calculations. In case of too few fixed inliers, homography calculations can fail.
In addition to reference points needing to be fixed points, these reference points need to be easily recognized as they have to match only the corresponding point on the scene and not other points, and the match algorithm must be adapted for real time applications such as video applications in which it must maintain the video rate.
What is needed is a method of video scene recognition in which there are too few fixed points for known scene recognition methods to work.